More

karmasimida · 2026-02-20T20:22:50 1771618970

Does local AI have a future? The models are getting ridiculously big and any storage hardware is hoarded by few companies for next 2 years and nvidia has stopped making consumer GPU for this year.

It seems to me there is no chance local ML is going to be anywhere out of the toy status comparing to closed source ones in short term

rhdunn · 2026-02-20T20:36:25 1771619785

Mistral have small variants (3B, 8B, 14B, etc.), as do others like IBM Granite and Qwen. Then there are finetunes based on these models, depending on your workflow/requirements.

karmasimida · 2026-02-20T23:23:08 1771629788

True, but anything remotely useful is 300B and above

Eupolemos · 2026-02-21T01:14:17 1771636457

That is a very broad and silly position to take, especially in this thread.

I use Devstral 2 and Gemini 3 daily.

dust42 · 2026-02-20T21:07:17 1771621637

I am actually doing now a good part of dev with Qwen3-Coder-Next on an M1 64GB with Qwen Code CLI (a fork of Gemini CLI). I very much like

  a) to have an idea how much tokens I use and 
  b) be independent of VC financed token machines and 
  c) I can use it on a plane/train

Also I never have to wait in a queue, nor will I be told to wait for a few hours. And I get many answers in a second.

I don't do full vibe coding with a dozen agents though. I read all the code it produces and guide it where necessary.

Last not least, at some point the VC funded party will be over and when this happens one better knows how to be highly efficient in AI token use.

ttoinou · 2026-02-21T01:35:41 1771637741

How much tokens per seconds are you getting ?

Whats the advantage of qwen code cli over opencode ?

dust42 · 2026-02-21T03:46:14 1771645574

320 tok/s PP and 42 tok/s TG with 4bit quant and MLX. Llama.cpp was half for this model but afaik has improved a few days ago, I haven't yet tested though.

I have tried many tools locally and was never really happy with any. I tried finally Qwen Code CLI assuming that it would run well with a Qwen model and it does. YMMV, I mostly do javascript and Python. Most important setting was to set the max context size, it then auto compacts before reaching it. I run with 65536 but may raise this a bit.

Last not least OpenCode is VC funded, at some point they will have to make money while Gemini CLI / Qwen CLI are not the primary products of the companies but definitely dog-fooded.

karmasimida · 2026-02-19T18:41:39 1771526499

Gemini just doesn’t do even mildly well in agentic stuff and I don’t know why.

OpenAI has mostly caught up with Claude in agentic stuff, but Google needs to be there and be there quickly

onlyrealcuzzo · 2026-02-19T19:10:41 1771528241

Because Search is not agentic.

Most of Gemini's users are Search converts doing extended-Search-like behaviors.

Agentic workflows are a VERY small percentage of all LLM usage at the moment. As that market becomes more important, Google will pour more resources into it.

Macha · 2026-02-19T19:25:44 1771529144

> Agentic workflows are a VERY small percentage of all LLM usage at the moment. As that market becomes more important, Google will pour more resources into it.

I do wonder what percentage of revenue they are. I expect it's very outsized relative to usage (e.g. approximately nobody who is receiving them is paying for those summaries at the top of search results)

curly6 · 2026-02-19T20:02:58 1771531378

> Most agent actions on our public API are low-risk and reversible. Software engineering accounted for nearly 50% of agentic activity, but we saw emerging usage in healthcare, finance, and cybersecurity.

via Anthropic

https://www.anthropic.com/research/measuring-agent-autonomy

this doesn’t answer your question, but maybe Google is comfortable with driving traffic and dependency through their platform until they can do something like this

https://www.adweek.com/media/google-gemini-ads-2026/

onlyrealcuzzo · 2026-02-19T20:04:57 1771531497

> (e.g. approximately nobody who is receiving them is paying for those summaries at the top of search results)

Nobody is paying for Search. According to Google's earnings reports - AI Overviews is increasing overall clicks on ads and overall search volume.

bayindirh · 2026-02-19T21:09:20 1771535360

So, apparently switching to Kagi continues to pay in dividends, elegantly.

No ads, no forced AI overview, no profit centric reordering of results, plus being able to reorder results personally, and more.

alphabetting · 2026-02-19T19:00:13 1771527613

the agentic benchmarks for 3.1 indicate Gemini has caught up. the gains are big from 3.0 to 3.1.

For example the APEX-Agents benchmark for long time horizon investment banking, consulting and legal work:

1. Gemini 3.1 Pro - 33.2% 2. Opus 4.6 - 29.8% 3. GPT 5.2 Codex - 27.6% 4. Gemini Flash 3.0 - 24.0% 5. GPT 5.2 - 23.0% 6. Gemini 3.0 Pro - 18.0%

kakugawa · 2026-02-19T21:33:10 1771536790

In mid-2024, Anthropic made the deliberate decision to stop chasing benchmarks and focus on practical value. There was a lot of skepticism at the time, but it's proven to be a prescient decision.

girvo · 2026-02-19T21:30:30 1771536630

Benchmarks are basically straight up meaningless at this point in my experience. If they mattered and were the whole story, those Chinese open models would be stomping the competition right now. Instead they're merely decent when you use them in anger for real work.

I'll withhold judgement until I've tried to use it.

phatfish · 2026-02-20T10:41:04 1771584064

Does anyone know what this "APEX-Agents benchmark for long time horizon investment banking, consulting and legal work" actually evaluates?

That sounds so broad that creating a meaningful benchmark is probably as difficult as creating an AI that actually "solves" those domains.

avereveard · 2026-02-19T23:16:55 1771543015

What's your opinion of glm5 if you had a chance to use it

girvo · 2026-02-20T01:28:44 1771550924

I haven’t yet, though I will be this weekend!

metadat · 2026-02-19T22:17:17 1771539437

Ranking Codex 5.2 ahead of plain 5.2 doesn't make sense. Codex is expressly designed for coding tasks. Not systems design, not problem analysis, and definitely not banking, but actually solving specific programming tasks (and it's very, very good at this). GPT 5.2 (non-codex) is better in every other way.

nl · 2026-02-19T23:00:13 1771542013

Codex has been post-trained for coding, including agentic coding tasks.

It's certainly not impossible that the better long-horizon agentic performance in Codex overcomes any deficiencies in outright banking knowledge that Codex 5.2 has vs plain 5.2.

306bobby · 2026-02-19T22:31:09 1771540269

It could be problem specific. There are certain non program things that opus seems better than sonnet at as well

306bobby · 2026-02-19T22:32:08 1771540328

Swapped sonnet and opus on my last reply, oops

blueaquilae · 2026-02-19T21:26:52 1771536412

Marketing team agree with benchmark score...

HardCodedBias · 2026-02-19T20:28:38 1771532918

LOL come on man.

Let's give it a couple of days since no one believes anything from benchmarks, especially from the Gemini team (or Meta).

If we see on HN that people are willing switching their coding environment, we'll know "hot damn they cooked" otherwise this is another wiff by Google.

drivebyhooting · 2026-02-19T22:27:49 1771540069

You can’t put Gemini and Meta in the same sentence. Llama 4 was DOA, and Meta has given up on frontier models. Internally they’re using Claude.

not_ai · 2026-02-19T23:38:50 1771544330

After spending all that money and firing a bunch of people? Is the new group doing anything at this point?

dekhn · 2026-02-20T00:16:19 1771546579

They are busy demonstrating that Mark Zuckerberg has no sense at all.

swftarrow · 2026-02-19T21:59:20 1771538360

I suspect a large part of Google's lag is due to being overly focused on integrating Gemini with their existing product and app lines.

hintymad · 2026-02-19T21:45:13 1771537513

My guess is that Gemini team didn't focus on the large-scale RL training for the agentic workload. And they are trying to catch up with 3.1.

gavmor · 2026-02-20T02:33:50 1771554830

I've had plenty of success with skills juggling various entities via CLI.

renegade-otter · 2026-02-19T21:20:17 1771536017

It's like anything Google - they do the cool part and then lose interest with the last 10%. Writing code is easy, building products that print money is hard.

miohtama · 2026-02-19T21:58:52 1771538332

One does not need products if you have monopoly on search

margorczynski · 2026-02-19T22:03:23 1771538603

That monopoly is worth less as time goes by and people more and more use LLMs or similar systems to search for info. In my case I've cut down a lot of Googling since more competent LLMs appeared.

ionwake · 2026-02-19T18:59:59 1771527599

Can you explain what you mean by its bad at agentic stuff?

karmasimida · 2026-02-19T19:12:41 1771528361

Accomplish the task I give to it without fighting me with it.

I think this is classic precision/recall issue: the model needs to stay on task, but also infer what user might want but not explicitly stated. Gemini seems particularly bad that recall, where it goes out of bounds

ionwake · 2026-02-19T22:25:13 1771539913

cool thanks for the explanation

karmasimida · 2026-02-17T18:34:15 1771353255

Flash models are nowhere near Pro models in daily use. Much higher hallucinations, and easy to get into a death sprawl of failed tool uses and never come out

You should always take those claim that smaller models are as capable as larger models with a grain of salt.

justinhj · 2026-02-17T20:55:13 1771361713

Flash model n is generally a slightly better Pro model (n-1), in other words you get to use the previously premium model as a cheaper/faster version. That has value.

karmasimida · 2026-02-17T21:52:16 1771365136

They do have value, because they are much much cheaper.

But no, 3.0 flash is not as good as 2.5 pro, I use both of them extensively, especially in translation. 3.0 flash will confidently mistranslate some certain things, while 2.5 pro will not.

justinhj · 2026-02-18T00:20:38 1771374038

Totally fair. Translation is one of those specific domains where model size correlates directly with quality, and no amount of architectural efficiency can fully replace parameter count.

karmasimida · 2026-02-16T05:15:18 1771218918

I am fine with the founder joining OpenAI, he gets to get paid regardless.

I am not confident that the open source version will get the maintenance it deserves though, now the founder has already exited. There is no incentive for OpenAI to keep the open sourced version better than their future closed source alternative.

karmasimida · 2026-02-16T05:10:53 1771218653

Why?

You can literally ask codex to build a slim version for you overnight.

I love OpenClaw, but I really don't think there is anything that can't be cloned.

karmasimida · 2026-02-12T17:12:51 1770916371

It is over

baal80spam · 2026-02-12T17:17:45 1770916665

I for one welcome our new AI overlords.

karmasimida · 2026-02-08T10:28:56 1770546536

LLMs are new runtimes.

coffeebeqn · 2026-02-08T11:14:03 1770549243

Alright I’m out

karmasimida · 2026-02-07T19:46:00 1770493560

> If you would like to grieve, I invite you to grieve with me.

I think we should move past this quickly. Coding itself is fun but is also labour , building something is the what is rewarding.

notnullorvoid · 2026-02-07T22:19:17 1770502757

By that logic prompting an AI is also labour.

It's not even always a more efficient form of labour. I've experienced many scenarios with AI where prompting it to do the right thing takes longer and requires writing/reading more text compared to writing the code myself.

karmasimida · 2026-02-07T19:43:09 1770493389

Then you are using it the wrong way

Driving is a skill that needs to be learnt, same with working with agents.

karmasimida · 2026-02-07T19:40:40 1770493240

I give a year, the realization would be brutal.