More

grayxu · 2026-03-10T12:00:30 1773144030

This is not a valid argument. TPS is essentially QoS and can be adjusted; more GPUs allocated will result in higher speed.

yorwba · 2026-03-10T12:44:28 1773146668

There are sequential dependencies, so you can't just arbitrarily increase speed by parallelizing over more GPUs. Every token depends on all previous tokens, every layer depends on all previous layers. You can arbitrarily slow a model down by using fewer, slower GPUs (or none at all), though.

erichocean · 2026-03-10T13:01:00 1773147660

Partially true, you can predict multiple tokens and confirm, which typically gives a 2-3x speedup in practice.

(Confirmation is faster than prediction.)

Many models architectures are specifically designed to make this efficient.

---

Separately, your statement is only true for the same gen hardware, interconnects, and quantization.

grumpoholic · 2026-03-10T13:01:48 1773147708

With speculative decoding you can use more models to speed up the generation however.

salawat · 2026-03-12T23:26:18 1773357978

Yes, because speculation has NEVER bitten us in the ass before, right? Coughs in Spectre

Speculative decoding is just running more hardware to get a faster prediction. Essentially, setting more money on fire if you're being billed per token.

grayxu · 2026-03-10T11:59:49 1773143989

Actually, Opus might achieve a lower cost with the help of TPUs.

grayxu · 2026-01-06T07:03:32 1767683012

The memory wall is an eternal problem when performing computations on the CPU

grayxu · 2026-01-02T11:54:59 1767354899

other filesystems are just as susceptible to data corruption from memory errors. this is not a weakness unique to ZFS.

grayxu · 2026-01-02T08:08:13 1767341293

While this guide covers roughly 80% of the material, it remains a high-level overview that lacks depth. I can't confirm if it was LLM-generated, but the content is undeniably superficial. Real-world production environments are far more complex; for instance, despite other users mentioning hugepages and TLB, there is no discussion of critical issues like TLB shootdown.

grayxu · 2025-12-25T16:41:42 1766680902

The reason is that you are not a Computer Science PhD. But soft skills (such as storytelling, sense of ownership, etc.) can still be passed down

mathgeek · 2025-12-25T16:47:57 1766681277

It's a bit ironic that the "soft" skills are becoming the hard skills nowadays. A lot of the AI buzz these days is around PM's, Data Scientists, etc. who now have the tools to code "well enough" and are attractive due to their people skills and/or other skillsets.

Not to say this is an objective analysis, just observing the subjective trends.

grayxu · 2025-12-16T06:06:19 1765865179

same for chinese students

grayxu · on Dec 26, 2024

You can buy a used AX1800 with OpenWrt in China for around five or six euros.... XD

grayxu · on Dec 4, 2024

First, you need to identify and quantify your skills and strengths.

grayxu · on Sept 22, 2024

Too many GitHub repositories can integrate GPT into smart speakers.