Hacker Newsnew | past | comments | ask | show | jobs | submit | hadlock's commentslogin

Seems like AMD 395+ is only about 16 tokens/s which is 25-33% the speed of SOTA models. Break even on a $3000 machine is ~15 months

thats pessimistic. do the calc assuming Cloud provider X changes your nondetermistic output every Y Months by Z probability and increases prices by 10% every 6 months.

slow and steady is worth exponentials. keep slopppping it my boid.


How many t/s output are you getting at Q4_K_M with 200k context on your Strix Halo if you ask it to add a new feature to a codebase.

Qwen 3.6 27B, and other dense models, as opposed to MoE models do NOT scale well. Like I said in my original post, for 27B usage specifically, I'd take a dGPU with 32GB of VRAM over Strix Halo. I also don't usually benchmark out to 200k, my typical depths are 0, 16k, 32k, 64k, 128k. That said, with Qwen 3.5 122B A10B, I am still getting 70 tok/s PP speed and 20 tok/s TG speed at 128k depth, and with Nemotron 3 Super 120B A10B, 160 tok/s PP speed and 16 tok/s TG speed at 128k depth. With smaller MoE models, I did bench Qwen 3.6 35B A3B at 214 tok/s PP at 128k and 34.5 tok/s TG at 131k.

Because dense models degrade so severely, I rarely bench them past 32k-64k, however, I did find a Gemma4 31B bench I did - down to 22 tok/s PP speed and 6 tok/s TG speed at 128k.

Nemotron models specifically, because of their Mamba2 hybrid SSM architecture, scale exceptionally well, and I have benchmarks for 200k, 300k, 400k, 500k, and 600k for Nemotron 3 Super. I will use depth: PP512/TG128 for simplicity.

100k: 206/16 200k: 136/16 300k: 95/14 400k: 61/13 500k: 45/13 600k: 36/12


Thanks this is very helpful for planning out localLLM buy. Sounds like we are still at least 1 generation out (DDR6 500-700GB/s memory) from getting to that magic ~25-30TG/s. Nemotron 3 Super architecture sounds promising.

Medusa Halo is on my wishlist, but I'm hearing late 2027 :(

M5 Ultra may be a better near-term option, expected in June. Supposedly ~1.2 TB/s unified memory, unsure of whether Apple will revive the 512 GB SKU or limit to 256 GB, but the new Neural Engine in every GPU core should help dramatically. These were always compute limted rather than bandwidth limited, even in M3 Ultra era.

The big cost of course being that you're locked into Apple silicon and Apple's walled garden. You can still use MacOS without creating an Apple account... for now...

At least Apple Silicon holds resale value remarkably well.


Second hand smoke is a large (almost overwhelming) factor in SIDS. Also for people who don't smoke, it smells f--king disgusting. Nobody wants to deal with that in their life.

United has already cut flights by 5%, the article says KLM is cutting ~1% of their flights, both citing fuel shortages. If giant companies on opposite sides of the Atlantic, are saying this is an issue, it's probably worth taking their word for it

KLM is citing fuel price, not shortage. They’re cutting under utilized flights which they cannot perform profitably at current prices. They’ve explicitly said it’s not because of a shortage.

https://nieuws.klm.com/statement-situatie-midden-oosten/


Aren't those identical things? Shortage of commodity X, relative to demand, drives up prices for X.

A shortage can also be physical. The fuel you already bought (and possibly paid for) cannot be delivered. Maybe the actual delivery is the issue. Maybe a government confiscated it for other uses. Or maybe the fuel doesn't exist at all, because the refinery didn't have the oil to produce it.

https://news.klm.com/statement-situation-middle-east/

> ... due to rising kerosene costs, are currently no longer financially viable to operate. There is no kerosene shortage.


10 minutes a day of extreme power usage is probably fine for people asking for directions to the store, setting calendar reminders, timers, checking for important emails etc. AI on your phone will be incredibly useful but power usage doesn't matter when total usage is less than 15 minutes per day. I don't think the average person expects to vibe code on the phone for 8 hours a day.

10 minutes a day or 15 minutes a day is what the inference workload is like on fairly small models. Once you start streaming in weights from SSD, things slow down quite a bit and become quite power hungry.

Harbor freight sells three tiers of many of their more popular tools and they're not shy about it. Most of their signage says "ok/better/best" and they're very transparent about what you're buying. I can buy a $9 angle grinder and on the same shelf I an also buy a $85 angle grinder, with the "better" model running ~$25-40. Harbor Freight used to have exclusively cheap junk but their "better" tier stuff is more than adequate for home DIYers

It probably helps that the founder is still the owner. Once that guy or his son dies (he's getting up there) it would not suprise me if the brand spirals into decay.


Hottest day of the year in the US varies by 3 months from California to Texas, which is only about half the width of the country. I would imagine the region you're in has a different hottest day of the year from say Kashmir or your neighbor Sri Lanka.


The three months difference must be based on a wild corner case. What cities are you basing that statement on?

I played around with weatherspark and all the places I tried looked like this :

https://weatherspark.com/compare/y/1705~8813/Comparison-of-t...


I don't know whether to call it a corner case or not, but I was pretty easily able to find this one (based on my own experience – the peak temperature in the East Bay has always felt very late in the year): https://weatherspark.com/compare/y/541~3268/Comparison-of-th...


3 months? Wow. It should be impossible to put seasons on a shared calendar for the whole country.


You can get some black "machinist's layout bluing" which will stain it better than a sharpie would. It's not going to be a perfect color match but better than 50%


>while ensuring you can't do inconvenient things like say, bulk exporting your own data

I think this is the key; I want my analysts to be able to access 40% of the database they need to do their job, but not the other 60% parts that would allow them to dump the business-secrets part of the db, and start up business across the street. You can do this to some extent with roles etc but MCP in some ways is the data firewall as your last line of protection/auth.


I'm pretty sure you just described Little Caesar's Pizza business model. I recall way back they were $5 but even today here in California their large pepperoni is only $10 where competitors are charging $27-35


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: