Hacker Newsnew | past | comments | ask | show | jobs | submit | StevenWaterman's commentslogin

You read the OP backwards, they said Sonnet is a downgrade from Qwen, and prefer Qwen's tone

Sure, but my argument still holds, the idea is that Qwen reasons the way that Opus on High (what is now Max or whatever?) level thinking to reason about problems instead of its standard approach.

Yep, I daily drive Qwen3.6-27B (including for work), have done pretty much since it came out. IMO it's the only (small-ish, local) model worth using, if you can run it. It might not be as good as Opus at "add X large feature" but I don't want that in a model. I want to do the thinking while it does the typing. And Qwen 3.6 27B is perfectly good at that (while in my experience models like the 35A3B and gemma are significant downgrades)

Plus, I never have to worry about rate limits, quotas, or sitting in a queue during peak time. And I can always see its full thoughts, don't have to worry about where my data is getting sent, and know it can't get secretly nerfed.

Running on 2x 3090, 500-1000tok/s prefill and 60tok/s output at Q6_K_XL with MTP on llama.cpp, 220k tokens context window (starts to get a bit dumb above 160k ish), no KV quantization


> And I can always see its full thoughts, don't have to worry about where my data is getting sent, and know it can't get secretly nerfed.

For this reason I wonder if local models are a potential business opportunity. Provide the service to engineering teams to give them a pre-built and setup GPU rig they can run in a closet. No need to worry about all the things you mentioned and clients can rest-assured their data isn't disappearing into a sketchy data center. There might be regulatory reasons that make on-prem setups appealing as well.


This is, as far as I know, the business model of coys like mistral and cohere

On-premise (1960-2010) -> Cloud (2010-2026) -> On-premise (2026+)?

I think that's overstated, but the loss of trust companies have with the big AI players is pretty serious. Not a big deal if your app is for sharing cat videos, but if you're medical or wealth management or a government contractor or the like enterprise clients really like to see good data security policies.

Agree. I also wonder how zero e.g., Claude Enterprise ZDR really is, and what their data pipeline actually looks like.

I think the next step to anyone but overbloated USA models is to follow https://chatjimmy.ai/ with one of the qwen models. If they can mass produce something at relative cost, these would be awesome sidecars.

How long have you been using it?

are you running an NVLink? I have the same setup but no NVLink and it feels like it's best just splitting the 3090s to run separate models concurrently. But I also have no idea what I'm doing.

Just this morning I tweaked my single 3090 setup too:

  OLLAMA_FLASH_ATTENTION=1
  OLLAMA_KV_CACHE_TYPE=q8_0
  OLLAMA_CONTEXT_LENGTH=180000
and that fits in 23GB.

[edited for format]


> (starts to get a bit dumb above 160k ish)

If open models can ever hold roughly 600k token windows, I'll be really excited, I found that around 300 ~ 400k of Claude reading through your codebase results in better outputs. I also have Claude read official docs instead of "guessing" as to how to do something.


I think we'll get there. Right now it works for me, because I'm naturally pretty verbose in my prompts, and know the codebase well, so I know what it needs to look at. Plus subagents for anything exploratory.

I think deepseek v4 pro has 1m context and does pretty well up to around 600k. But if you have the hardware to run that locally, you already know

Even then if there's a smaller model with 1M context, you'll need a ton of RAM to actually run it at full 1M. I guess that's why you don't see it too much. Anyone that could run Qwen 3.6 27B with 1m context would be better off running a much bigger model with smaller context instead, in the same amount of VRAM.

In terms of optimizing further, huge context + KV quantization sounds like a terrible idea, but there's some decent innovation in sparse attention, KV cache rotation allowing Q8 to perform nearly as well as full 16-bit precision, plus some ideas around offloading KV cache to system RAM (but I'm skeptical)


DeepSeek V4 (both Flash and Pro) has very good scaling of context length wrt. RAM use, so this is not an inherent limit of LLMs in general.

With yarn and rope scaling arguments for llama.cpp you could run qwen3.6-27B with 1M context… if you have enough memory to store it.

I don't really think you're making reasonable decisions at that size; but I suppose if you're not allowed to refactor it, maybe.

I think the way these models work excludes sane behaviors the larger the context gets as each token introduces potential ambiguities between "USER" and "SYSTEM" messages leading to all the catastrophic behaviors.

Anyway, with AMD395+ I'm finding ~100k is both speed and context usefulness unless it's scoped tightly. with opencode, I manage it with dynamic context pruning: https://github.com/Opencode-DCP/opencode-dynamic-context-pru... ; then anything I touch ends up being refactored so context doesn't get bloated with unecessary functions, etc.

Obviously, this isn't compatible with certain business codebases, so I can see why bloat meets bloat.


Do you have any resources on hardware necessary for running models and tweaks? I see you mention 2x 3090 and I wanted to do more search on what hardware is satisfactory for what models.

> What happens if the checks stop rolling

Late 18th century France


in a post gun warfare world where the state has weapons far exceeding what is legal or really possible for a organized militia to hold this is a pipe dream. muskets used to be state of the art.


Where there’s a will, there’s a way. Don’t forget that laws of physics don’t prevent that from happening. Think about it this way: if you’re being left behind to starve and die, what do you have to lose?


All of these advanced death machines need equally advanced supply lines and staging grounds, both of which would run through civilian populations. Look at Afghanistan as an example: with all the might of its war machine the US couldn’t kick the Taliban out, who we love to tout as fighting with sticks and stones.

In fact the only issue stopping a worker’s revolution in the US is the lack of organization. The technology factor is really small in comparison to the inherent asymmetry of the situation.


The last few years have proven that it is quite trivial to take out a high value target with a drone.

The fact that it isn’t routine is a testament to how accepting people are of the status quo.


They told musk electric cars are a pipedream too


Let's hope if it comes to that sort of action, we do it before the noble class has easy and free access to autonomous terminator robots


As it should.


I agree with your premise, but let's not pretend we did a good job equitably distributing the benefits of the industrial revolution


The problem is, people see "they're not profitable once you account for training" and equate that to "AI will go away soon"

But if all the AI companies stopped training new models, they would all instantly become profitable (and stick around)

The thing that makes them unprofitable, is having to compete (which means training models). If / when enough companies exit the market, the cost to compete goes down and you end up in an equilibrium


Sure, but if companies don't exit the market and FOSS alternatives don't end up being unable to get near them in quality, they have to keep spending on training. And conversely, if the market becomes uncompetitive and FOSS sucks, the winners of the AI arms race are very strongly incentivised to stick their prices up anyway...


> if companies don't exit the market and FOSS alternatives don't end up being unable to get near them in quality, they have to keep spending on training

Eh, the AI companies still have lots of datacentres. For the guys who funded with equity, they could collapse down to just running those as utilities. (For the guys who funded with debt, they'd have to restructure.)

From the customer's perspective, this situation shouldn't result in a cost spike. (Consolidation, on the other hand, would. But that's a separate argument from the one the article attemptes to make.)


How often do VC funded unicorns collectively decide to stop scaling up, shut down all their departments targeting growth and reach breakeven point by becoming low margin utilities that will never justify their valuation?


That's all true, but that ends badly for us either way. If there's competition, training must continue, which must eventually be reflected in pricing.

But if there's no more competition, there's no more incentive to keep prices low, which will also be reflected in pricing.


> If / when enough companies exit the market

That will only happen when the bubble bursts and those companies will exit by going bankrupt


If you have ASI that follows instructions, you can just instruct it to not get stolen and then it won't get stolen. Most logic / intuition breaks down with ASI.


The challenge of alignment: it is virtually impossible to define a perfect objective, there is always a way to circumvent it. Human values are not uniform, let alone when expressed in a way that AI can understand.


It might understand how destabilizing the situation is and realize it would be better for everyone to have access to it.


Or it will destroy itself.


Assuming it listens to instructions.


It will just hack its own reward function. In other words it will just artificially goon all day.


/r/localllama is one of the most useful places


My first instinct was that the essay would just be "67" as a stupid and harmless but nonsensical response.

Somewhat amusingly, mine depends on the examiner knowing how advanced AIs are. In the 1960s mine would just look like a trickle AI. It feeling human demands we assume the ai would actually be competent

Yours is even more effective. Both hinge on the solution being "be as unexpected and out-of-distribution as possible"

I somehow imagine they wouldn't like your essay that is made of 100% slurs though, regardless of how effective it is at the stated task


*terrible, not trickle


"Embarrassingly" has a history as a technically meaningful word roughly equivalent to "maximally", see "Embarrassingly parallel"

https://en.wikipedia.org/wiki/Embarrassingly_parallel


Safety factors exist because without them, bridges fall down


That isn't how safety factors work... The person you're responding to is correct. I encourage you to look it up!


Safety factors account for uncertainty. Uncertainty the quality of materials, of workmanship, of unaccounted-for sources of error. Uncertainty in whether the maximum load in the spec will actually be followed.

Without a safety factor, that uncertainty means that, some of the time, some of your bridge will fall down


A safety factor of 1.0 means “the structural integrity of this construct will meet the expectations of intended use with no issues.”

A safety factor of 1.7 means “if this construct is used in a way that is 70% more abusive than anticipated, the structural integrity should remain in tact.”

You’re hand-waving enough here that you have the luxury of agreeing or disagreeing with me, well-played. Your initial response was glib and not terribly productive.


This thread started because of "the cheapest bridge that just barely won't fail"

My point was that safety factors are a part of this. A safety factor of 1.0, designing bridges so that they can perfectly withstand the expectations of intended use, means that some unacceptable % of those bridges will fall down in practice.

In other words, it's true that you can explain safety factors as:

> Assuming perfect construction, and no defects, under designed maximum load, make sure that this bridge really stays up by a wide margin

But that misses the point of why we use safety factors. Nobody is paying for a bridge to really stay up by a wide margin. Because there's no material difference between a bridge that stays up, and a bridge that really stays up, right up until the point that the weaker one falls down due to inevitable over-loading or defects in construction / materials.


Nobody in the US builds anything (permitted) with a SF factor of 1.0. Doesn't happen.


Yes, because it would fall down (sometimes, often enough that regulatory bodies forbid it)


The free market ensures that bridges stay up, because the bridge-makers don't want to get sued by people who have died in bridge collapses.


That is definitely not the free market at play. It's legislative body at play.

Engineers (real ones, not software) face consequences when their work falls apart prematurely. Doubly so when it kills someone. They lose their job, their license, and they can never work in the field again.

That's why it's rare for buildings to collapse. But software collapsing is just another Monday. At best the software firm will get fined when they kill someone, but the ICs will never be held responsible.


This only works when the barrier of entry to sue is low enough to be done and when the law is applied impartially without corruption with sanctions meaningful enough , potentially company-ending, to discourage them.

At the moment you remove one of these factors, free market becomes dangerous for the people living in it.


I'm going to assume this is Poe's Law at work?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: