Hacker Newsnew | past | comments | ask | show | jobs | submit | solomatov's commentslogin

It would have been better if they provided not just weights, but also some frontend where it is usable as is.

>but I have seen the local 122b model do smarter more correct things based on docs than opus

Could you please share more about this


Maybe a bit misleading. I have used in in two places.

One Is for local opencode coding and config of stuff the other is for agent-browser use and for both it did better (opus 4.6) for the thing I was testing atm. The problem with opus at the moment I tired it was overthinking and moving itself sometimes I the wrong direction (not that qwen does overthink sometimes). However sometimes less is more - maybe turning thinking down on opus would have helped me. Some people said that it is better to turn it of entirely when you start to impmenent code as it already knows what it needs to do it doesn't need more distraction.

Another example is my ghostty config I learned from queen that is has theme support - opus would always just make the theme in the main file


Just curious, the fixes are not about weights but about templates, am I right?

Yes so chat templates and the actual implementations

Did anyone try it and Gemma 4? Does it feel that it's better than Gemma 4?

Does anyone has any tips for starting with Gastown? I am comfortable with couple of agents running, but not yet comfortable with what Gastown offers.


Set a budget. Fund an openrouter account with the max you can stomach spending on this test and give it a shot.

At least, that’s what I would do, if I had any interest in testing out gastown with my own money. If my employer wants to pay for the testing, that’s another question entirely.


I mean not how to do it, it's not that hard, but how to be productive with it.


> more like supervising 8-15 agents

How do they do it? (My own record is 5 agents, but it is not typical). Do they use gastown or something?


I often have 10+ running in parallel. I’m attacking parallel problems that aren’t interdependent. Sometimes adding additional products can bring me up to 15+.

Gotta have really good test harnesses so they can largely fix themselves.


But how do you cover such amount of multi tasking? Could you give an example? I mean what kind of tasks allow such a parallelization?


context switching across the entirety of the feature surface for an app

You could easily have agents to work on login page, messaging feature, database/data model update, recommender system, backend api, etc


We have our doubts about this. Can you share your code or product? Anecdotally, my mistakes and lack of understanding exponentiate the more I try to parallelize.


Who is “we”?

As I said in the neighboring comment, for vibe coding side projects and prototypes for work I just merge and iterate. It works out more than it doesn’t. For anything bigger at work I cannot share as I’m at Apple.


But you have to keep it in your head, and remember all stuff at the same time. How is it possible to track, and do reviews one after another? Or are these pretty long running agents?


I’m not sure what you mean by keep it in your head? I know all of the parts the agents are working on. It’ll often be a mix between bigger tasks (some large refactor, new feature, etc) and small tasks (little bug fixes).

For prototyping I just merge. I don’t bother to review the code. For anything more important than I am reviewing the code and going back and forth. Basically there’s a queue of stuff demanding my attention, and I just serially go through them.

What’s also been really helpful to me is /simplify and similar code review skills (I have my own). That alone takes an agent a while to parse through everything it’s done and self reviews. It catches quite a lot itself this way.


>I’m not sure what you mean by keep it in your head?

If the project I work on is large enough, it takes me some time to get everything I need to understand for review into the short term memory. If it's small enough, it's less of a problem for me.


Honestly, I dont know. I could be mistaken about the exact number of agents - but not wrong about fact of AI-driven workflows which is heavily automated, and goes on for hours.

He's one (small) step from distinguished engineer, with 20+ patents to his name, and is an embedded programmer (largely C/C++) with 30+ years of experience in the field; and I've known him for nearly as long, so I put a lot of credence to his words.

But we don't usually talk work; he's the guitarist in our band :) [I'm the bass] So we mainly chill over music + beer. And lately, it's been less chill ¯\_(ツ)_/¯


Is there any publication which demonstrates that the improvement is really 10x?


It's like "decimate" -you would think 10x had literal force, but it's more figurative. It just means "moar"

(decimate had specific literal intent. Now it's just a force modifier like bigly)


The literal meaning was removing 1/10


> Removing 1/10

feels euphemistic for the original “colloquial” usage I have for it.

> The killing of one in ten, chosen by lots, from a rebellious city or a mutinous army was a punishment sometimes used by the Romans. The word has been used (loosely and unetymologically, to the irritation of pedants) since 1660s for "destroy a large but indefinite number of." [0]

[0] https://www.etymonline.com/word/decimate


Yup. What amuses me is that people think that decimate is to massively degrade something. I assume they're thinking "reduce to 1/10th" rather than "reduce to 9/10th". The effect is markedly different


A watched pot never boils. A watched vibe coder never 10x-es.


What this crate could be used for?


For converting HTTP URLs into interactive images of the webpage.

In other words: an internet browser.


Could you recommend which quantization level to use with it?


Does github copilot ToS allow this?



This is very interesting. This could allow custom harnesses to be used economically with Opus. Depending on the usage limits, this may be cheaper than their API.


I don't see why not. It's just using the Github Copilot API.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: