More

rudedogg · 2026-05-28T02:01:36 1779933696

That seems way off to me.

I skimmed the article, but couldn’t spot any details on their estimates. They mention 70b+ params as being large in several places. But we’ve had several 100b+ param models that trail Sonnet.

rudedogg · 2026-05-19T19:07:00 1779217620

If Google is actually getting cheaper inference than everyone else with their TPUs, this smells like trouble to me. Maybe serving LLMs at a profit is proving difficult.

Or maybe they think because their benchmarks are good they can ramp up the prices. Seems like they don’t have the market share to justify a move like that yet to me.

tempaccount420 · 2026-05-19T19:30:44 1779219044

This is not priced at inference cost.

My guess: it's the price at which they make more money than if they rent the TPUs to other companies.

The Gemini team has had trouble securing enough TPUs for their user's needs. They struggle with load and their rate limits are really bad. Maybe at a higher price, they have a better chance at getting more TPUs assigned?

gpm · 2026-05-19T20:34:54 1779222894

The cost at such they could rent out the TPUs, i.e. the market rate, is the inference cost.

Just because you are vertically integrated doesn't mean you get to discount the one business units products to the other. Doing so discounts the opportunity cost you pay and is just bad accounting.

KoolKat23 · 2026-05-19T23:45:17 1779234317

Basic business principle, you charge what people are willing to pay not what it costs.

HDThoreaun · 2026-05-19T21:23:37 1779225817

Depends on if you have spare capacity I think. They have minimal competition so they might be maximizing profit by charging prices higher than what clears all their supply.

sumedh · 2026-05-20T09:00:49 1779267649

> doesn't mean you get to discount the one business units products to the other

That depends, if all developers get used to Claude and Codex it will become harder for Google to attract them in the future.

They might lose devs in the long term.

gpm · 2026-05-20T13:42:20 1779284540

Predatory pricing is a great business strategy and all (particularly when countering the competitors predatory pricing - what could go wrong), but that doesn't mean that the gemini-team should account for it as if they're getting the compute cheaper, it just means that they should run a loss.

flaburgan · 2026-05-20T14:02:56 1779285776

That's actually where AI differs: there is no network effect. So no reason for me to stay with a tool if suddenly another one is better or cheaper. Changing the model I use is literally two clicks in Zed. No retention possible for providers.

dash2 · 2026-05-19T22:06:05 1779228365

Look up “double marginalisation”.

BoorishBears · 2026-05-19T22:01:31 1779228091

This is trouble if you're not Google/OpenAI/Anthropic: they're all shifting towards pricing for the economic value of the knowledge work they're aiding.

The economic value increases non-linearly as models get more intelligent: being 10% more capable unlocks way more than 10% in downstream value.

That's trouble because the non-linear component means at some point their margins will stop primarily defined by the cost of compute, and start being dominated by how intelligent the model is.

At that point you can expect compute prices to skyrocket and free capacity to plummet, so even if you have a model that's "good enough", you can't afford to deploy it at scale.

(and in terms of timing, I think they're all well under the curve for pricing by economic value. Everyone is talking about Uber spending millions on tokens, but how much payroll did they pay while devs scrolled their phones and waited for CC to do their job?)

tskj · 2026-05-20T13:40:33 1779284433

Thank you, this is obviously where we're heading. People who think in terms of "will it ever be profitable to sell tokens" are thinking in the wrong framework entirely. The correct framework is "will it be profitable to sell knowledge work", and the answer will clearly be "yes".

spyckie2 · 2026-05-19T20:22:08 1779222128

Its probably that in 1 or 2 years local (free) models will completely take the place of cheap models so cheap models need to move up the quality chain.

You have free local models for most tasks, $20 subscriptions for near-frontier intelligence, and API per token costs for frontier intelligence.

Flash seems to be targeting the near-frontier category.

TurdF3rguson · 2026-05-19T21:05:07 1779224707

That might work if it wasn't for FOMO. Are you ok with only $20 of frontier usage a month?

rohansood15 · 2026-05-20T01:08:32 1779239312

Subjective, but if we compare to compute not everyone needs the most expensive laptops or super computers for their work.

I think frontier models will be invaluable for scientific research, defense, financial analysis and such. But the average person probably would be reasonably well-served with a local model.

If you're in sales, customer service, product management and such - the leading open models at the 30B mark are already good enough.

TurdF3rguson · 2026-05-20T08:22:03 1779265323

I mean customer service maybe, but how much longer will humans even be doing that job at this point?

booty · 2026-05-19T20:41:19 1779223279

Prevailing wisdom is that serving LLMs at a profit is achievable... it's when you factor in the cost of training them that prices get astronomical real fast.

Open-source model inference providers (who do not have to bear the cost of training) seem able to do it at much lower prices.

https://www.together.ai/pricing

https://fireworks.ai/pricing#serverless-pricing (scroll down to headline models)

Of course, it's possible that they are burning through investor cash as well, and apples-to-apples comparisons are not possible because AFAIK Google does not mention the size/paramcount for 3.5 Flash.

But if the prevailing wisdom is true, I think it's actually encouraging. It suggests that OpenAI and Anthropic could perhaps, if they need to, achieve profitability if they slow down model development and focus on tooling etc. instead. If true that's probably good news for everybody w.r.t. preventing a bursting of this economic bubble.

...my opinions here are of course, conjecture built on top of conjecture....

eklitzke · 2026-05-20T00:26:54 1779236814

Most of the training cost is not in the final training run, it's in all of the R&D (including salaries, equity, etc.) that it takes to get to the final training run. The actual cost of all of the TPUs (or GPUs), power, networking, storage, etc. for the final training run is significant, but it's even more expensive to have this huge R&D team doing frontier model development and using a lot of those same resources during development.

I think you're right that releasing models at a slower cadence would bring down costs to some degree, but it's not clear how much. All of these companies could significantly reduce their opex but at the risk of falling behind in terms of being at the frontier.

HDBaseT · 2026-05-19T22:40:56 1779230456

Not to discredit you, because you are 100% correct but tangential note about together.ai, they seem fairly unreliable with constant outages or higher than normal latency.

IncreasePosts · 2026-05-19T19:11:47 1779217907

Maybe the margins are just very large for Google because they predict so much demand for 3.5?

GodelNumbering · 2026-05-19T19:15:23 1779218123

This combined with locally runnable models getting pretty good recently (e.g. Qwen 3.6) tells me that it's time to seriously consider local dev setup again

cft · 2026-05-19T20:09:19 1779221359

This should become the new Apple's hardware and software play. I am hopeful about the new CEO

arcatech · 2026-05-20T11:12:34 1779275554

Nothing new about that play. They have been heading in this direction for a very long time now.

cft · 2026-05-20T16:35:40 1779294940

Perhaps they would have made the basic spell-checker work on MacOS apple silicon then in this long time?

MASNeo · 2026-05-19T19:31:47 1779219107

Besides the cost you get the control, transparency and ability to identify small language models or LoRA you want to serve even more cost effective.

rudedogg · 2026-05-14T21:42:58 1778794978

The “Apps” app is so bad on macOS too (seems built off of Spotlight?). I’ll type the exact app name and it’ll suggest the one on my phone, an installer in Downloads, etc..

No one dog-fooded that thing.

nitwit005 · 2026-05-15T00:22:12 1778804532

Someone has realized the search results are insane, as there's at least one obvious fix buried in settings:

I open Finder, click on Applications, search "Google Chrome". Top results? MarketingAnalytics.yaml, aria-proptypes.md, and so on, from some project I cloned off of Github into my home directory at some point. I guess the file contents include "Google Chrome"?

Clearly insane, but under the "Advanced" finder settings, it's easy to find "Search the Current Folder". Suddenly, you get the result you'd expect.

noahbp · 2026-05-15T14:04:31 1778853871

It’s incredible that “Search the Current Folder” is not the default, nor, as far as I’m aware, can it be made the default.

c-hendricks · 2026-05-15T17:30:42 1778866242

> nor, as far as I’m aware, can it be made the default

Huh? You absolutely can, the post you're replying to says as much.

    defaults write com.apple.finder FXDefaultSearchScope SCcf

jshier · 2026-05-16T08:40:34 1778920834

No need for a default, you can set that in Finder’s settings.

veber-alex · 2026-05-15T11:41:43 1778845303

Spotlight search is completely broken.

If I type "sa" the first result is Safari but if I type "safa" I get "Adguard for Safari.app".

In what world does this make sense?

rudedogg · 2026-05-14T19:03:42 1778785422

Zig has these modern language features too fwiw.

I think the goal was to do a massive rewrite for Anthropic (they acquired bun) and show that rewriting projects from lang -> lang with Claude can reduce security vulnerabilities to help with the hype for an IPO.

I don’t use/know Rust so I can’t comment on the quality, but there was a public security review that found issues with the new Rust code: https://x.com/SwivalAgent/status/2054468328119279923

This is an interesting experiment but I’m skeptical of any claims of success by Jarred/Anthropic due to the incentive to hype agents. There’s probably a trillion dollars at stake with the IPO. And Anthropic seems to be developing this part of their business with Mythos and the super review features.

But I’d like to see the same experiment done on a project without so much relying on the story being success.

nsagent · 2026-05-14T19:49:50 1778788190

There's a reasonable request to run the same analysis for the Zig version of the code as a comparison.

In lieu of that, it seems the Swivel devs ran an analysis on Tigerbeetle, one of the other major Zig projects, and found only 7 medium/low priority issues:

https://xcancel.com/SwivalAgent/status/2054063291266113994

matklad · 2026-05-15T00:36:40 1778805400

To clarify, those are things an LLM considers to be issues, and LLMs can make mistakes.

Some of those are clear false positives, others I need to revisit tomorrow to say one way or another.

nsagent · 2026-05-15T18:39:12 1778870352

Agreed. I was more pointing out just how well written Tigerbeetle is in comparison (at least according to this LLM-based analysis).

rudedogg · 2026-05-13T18:23:39 1778696619

> The Fed reports

Have you happened to purchase anything in the past 12 months, and looked at the Fed's inflation numbers?

dragonwriter · 2026-05-14T01:29:12 1778722152

> Have you happened to purchase anything in the past 12 months, and looked at the Fed's inflation numbers?

The Fed doesn't issue inflation numbers. The usually cited headline inflation numbers (CPI) are from the Department of Labor’s Bureau of Labor Statistics, the ones used by the Fed as an input to monetary policy decisions (PCE) are issued by the Department of Commerce’s Bureau of Economic Analysis.

rudedogg · 2026-05-11T21:35:27 1778535327

> 1) Higher level code is easier for LLMs to review and iterate upon. The more the intent is clear from the code, the easier it is for humans and LLMs to work with.

The counter-argument, and one that matches my experience is working at a lower level is actually beneficial for LLMs since they can see the whole picture and don’t have to guess at abstractions.

rudedogg · 2026-05-07T22:05:07 1778191507

The abstractions are the veil that make the theft slightly less obvious

rudedogg · 2026-05-05T02:15:41 1777947341

> it requires a hands-on approach and actually understanding what's being built.

I think this is true regardless of what language you’re using.

I’ve built a lot in Zig and there’s no difference between vibing stuff in it versus TypeScript/React. Claude can “one-shot” them both, and will mimic existing code or grep the standard library to figure everything out.

dns_snek · 2026-05-05T06:46:35 1777963595

The code may run but it's rarely idiomatic. For example they almost never define functions inside the struct/union/enum namespace unless it already exists and follows that style, i.e. I expect "foo.bar()" but they make it "FooMod.bar(foo)".

rudedogg · 2026-04-13T15:19:23 1776093563

I’ve been vibe-ish coding a GUI Toolkit in Zig w/ SDL3 and Vulkan for a few months now.

It has lots of features, but I posted a demo of some fun with buttons here: https://x.com/rudedoggtweets/status/2043531378181161357

I think I’m building up an agentic IDE, just haven’t committed yet, but probably will this month.

One cool new thing I’m trying is running models directly w/ Vulkan. I’m about halfway there with my first model, but it’s going better/easier than I anticipated and I’m hoping I can make something very specialized and fast.

rudedogg · 2026-04-06T01:55:58 1775440558

This is fun, FYI you don’t have to sign in/up with a Google account. I hesitated downloading it for that reason.