The model selection for title generation works as follows (prompt.ts:1956-1960):
1. If the title agent has an explicit model configured — that model is used.
2. Otherwise, it tries Provider.getSmallModel(providerID) — which picks a "small" model from the same provider as the current session, using this priority list (provider.ts:1396-1402):
- claude-haiku-4-5 / claude-haiku-4.5 / 3-5-haiku / 3.5-haiku
- gemini-3-flash / gemini-2.5-flash
- gpt-5-nano
- (Copilot adds gpt-5-mini at the front; opencode provider uses only gpt-5-nano)
3. If no small model is found — it falls back to the same model currently being used for the session.
So by default, title generation uses a cheaper/faster small model from the same provider (e.g., Haiku if on Anthropic, Flash if on Google, nano if on OpenAI), and if none are available, it just uses whatever model the user is chatting with. You can also override this entirely by configuring a model on the title agent.
When I did this, I used a single local llama.cpp server instance as my main model without setting a small model and it did not use it for chat titles while I used it for prompts.
Chat titles would work even when the local llama.cpp server hadn't started, and it was never in the the llama.cpp logs, it used an external model I hadn't set up and had not intended to use.
It was only when I set `small_model` that I was able to route title generation to my own models.
They can, but should be explicitly told to do that. Otherwise they just everything in batches. Anyway pure TDD or not but tests catches only what you tell AI to write. AI does not now what is right, it does what you told it to do. The above problem wouldn’t be solved by pure TDD.
I have 5 years old iPhone SE2020 that is relatively cheap having in mind that is 5 years old. None of Androids served me that long. Only Motorola tried, but water killed it. Water has not killed iPhone when my son threw it into pond. Which Android is that good, practically speaking?
relative to an iPhone Pro, yes. Relative to many other phones, No. It shipped at $399. You can buy 4 to 12 android phones for that price. I'm an iPhone user but my sister and her family are Android.
I doubt I would get the same quality and reliability. Good Android phones are equally expensive and it is very hard to know which are actually good without doing research. As well I had bad experience with some Google Pixel model.
That is always the excuse people bring up to ignore the point. You can spend $1000 on a fancy chefs knife, or you can spend $30 on an Ikea chefs knife. Sure, the $1000 knife is higher quality. Yet, millions of people are still doing just find with the Ikea knife.
A cheap car will still get you to/from work over an expensive "higher quality" car.
Lots of families don't have money to buy an iPhone for every member of the family but do have enough to buy an Android for every member of the family.
I am typing this on a 9 year old iPhone 8 Plus. Battery was replaced once after 6 years, replacement battery is still lasting more than a day. Apps are slowly losing support for it, but other than that it mostly does what I want, and still gets security updates for really bad stuff.
I still have one of those lying around in the draw. It's the backup phone and every time I or my partner needs to use it I am surprised at how well it still works.
I have a S21 which was released in early 2022. Bought it new in late 2023 for 430€. I don't see any reason to get a new one currently. Had to service it twice for water damage to be honest but service was free
Maybe it is language specific? Maybe LLMs have a lot of good JavaScript/TypeScript samples for training and it works for those devs (e.g. me). I heard that Scala devs have problems with LLMs writing code too. I am puzzled by good devs not managing to get LLM work for them.
I definitely think it's language specific. My history may deceive me here, but i believe that LLMs are infinitely better at pumping out python scripts than java. Now i have much, much more experience with java than python, so maybe it's just a case of what you don't know.... However, The tools it writes in python just work for me, and i can incrementally improve them and the tools get rationally better and more aligned with what i want.
I then ask it to do the same thing in java, and it spends a half hour trying to do the same job and gets caught in some bit of trivia around how to convert html escape characters, for instance, s.replace("<", "<").replace(">", ">").replace("\"").replace("""); as an example and endlessly compiles and fails over and over again, never able to figure out what it has done wrong, nor decides to give up on the minutia and continue with the more important parts.
Maybe it's because there's no overall benefit to these things.
There's been a lot of talk about it for the past few years but we're just not seeing impacts. Oh sure, management talk it up a lot, but where's the corresponding increase in feature delivery? Software stability? Gross profit? EBITDA?
Give me something measurable and I'll consider it.
When I used it before Christmas (free trial), it very visibly paused for a bit every so often, telling me that it was compressing/summarising its too-full context window.
I forget the exact phrasing, but it was impossible to miss unless you'd put everything in the equivalent of a Ralph loop and gone AFK or put the terminal in the background for extended periods.
However I run like 3x concurrent sessions that do multiple compacts throughout, for like 8hrs/day, and I go through a 20x subscription in about 1/2 week. So I'm extremely skeptical of these negative claims.
Edit: However I stay on top of my prompting efficiency, maybe doing some incredibly wasteful task is... wasteful?
This is where engineering practices help. Based on 1.5 years data from my team I can say that I see about 30% performance increase on mature system (about 9 years old code base), maybe more. The interesting stuff - LLMs is leverage, the better engineer you are the more you benefit from LLM.
I guess I am kind of "AI evangelist" in my circles (team, ecosystem and etc). I personally see benefits in "AI" both for side-projects and main work. However according to my last measurements improvements is not dramatic, it is huge (about 30%), but not dramatic. I share my insights purely to have less on my shoulders (if my team members can do more it is less for me to do).
In some cases, that’s true, but sometimes you need to update cutting rules because of law changes, or you saw different way of cutting for example. There are cases where this is not one time investment. What I agree with that cutting-it-yourself became significantly cheaper
The model selection for title generation works as follows (prompt.ts:1956-1960): 1. If the title agent has an explicit model configured — that model is used. 2. Otherwise, it tries Provider.getSmallModel(providerID) — which picks a "small" model from the same provider as the current session, using this priority list (provider.ts:1396-1402): - claude-haiku-4-5 / claude-haiku-4.5 / 3-5-haiku / 3.5-haiku - gemini-3-flash / gemini-2.5-flash - gpt-5-nano - (Copilot adds gpt-5-mini at the front; opencode provider uses only gpt-5-nano) 3. If no small model is found — it falls back to the same model currently being used for the session. So by default, title generation uses a cheaper/faster small model from the same provider (e.g., Haiku if on Anthropic, Flash if on Google, nano if on OpenAI), and if none are available, it just uses whatever model the user is chatting with. You can also override this entirely by configuring a model on the title agent.
reply