Hacker Newsnew | past | comments | ask | show | jobs | submit | pu_pe's commentslogin

I'm not sure the direction should be to finetune a small local model for each country or language. These models are already not particularly great at information retrieval, so I doubt anyone would use them for questions like the author suggests (ie who was the president between X and Y). Similarly, they are a little too lightweight to be used for translations too.

If the budget is indeed so modest (5.5 million euros!), I would focus completely on preparing datasets and making sure all open cultural artifacts that we can find are well documented in them. That way every model, private or open, that gets trained in the future could better represent the culture and language of your country.


I agree, the research is complex enough as is without having to worry about splitting it babel-like into multiple languages.

Yeah I think India is going the better route with Sarvam which is trained from scratch and still relatively cheap.

Sometimes code is definitely the bottleneck. For example some organizations have a very bureaucratic process guarding which projects get access to a development team and when. That's not needed if implementation is now faster/cheaper.

I'm also skeptical that development velocity is so separate from all those other things (context, stakeholder alignment,etc). It's much easier to get actionable feedback when you have a prototype.


So much faster inference with no quality degradation? All that for just some small memory overhead (drafter models are <1B it seems)?

They also published draft models for E4B and E2B. For those, the draft models are only 78m parameters: https://huggingface.co/google/gemma-4-E4B-it-assistant

Is it really no quality degradation?

I'm curious where my understanding is wrong, but I didn't think you necessarily got the exact same output with how I understand speculative decoding to be used. I thought that if the small model produces tokens that are "good enough", meaning within the top few tokens the larger model produces, they're accepted.

I thought it doesn't necessarily have to produce the exact same token the larger model would have produced to be accepted (and that requiring this would reduce the hit rate by a lot.) Just one the top model could have produced with whatever top-k and temperature settings.


It really is. This is because LLMs with a single output/user are strongly bandwidth limited. Although the hardware can generate multiple tokens simultaneously, it is slowed down if the tokens depend on each other, as is the case with regular text generation.

The draft model essentially predicts the next token quickly, enabling you to start generating the subsequent token in parallel. If the guess is right, the second generated token is correct. If it is wrong, the second generated token is also potentially wrong, so it must be generated again using the correct prior token obtained through the big model.

A poor draft model will simply slow down the process without affecting the output.


> If the guess is right

This is the crux. What makes the guess "right"?

I think the acceptance criteria is not that the token is exactly the token the big model would have produced. It's accepted of the big model verifies that the probability of that token was high enough.

How close it is to the same output (or same distribution of outputs) you'd get from running the big model would be dependent on temperature, top-k, top-p settings, or other inference parameters.


There is more compute available than bandwidth when computing LLMs.

It's like branch prediction - the CPU predicts what branch you'll take and starts executing it. Later you find out exactly what branch you took. If the prediction was correct, the speculative executed code is kept. If the prediction was wrong, it's thrown away, the pipeline is flushed, and the execution resumes from the branch point.

The same with this thing: 3 tokens, A-B-C were "predicted", you start computing ALL them 3 at the same time, hoping that the prediction checks out. And because of the mathematical structure of the transformer, it costs you almost the same to compute 3 tokens at a time or just one - you are limited by bandwidth, not compute. But CRITICALLY, each token depends on all the previous ones, so if you predicted wrongly one of the tokens, you need to discard all tokens predicted after (flush the pipeline). This is why a prediction is required and why you can't always compute 3 tokens simultaneously - the serial dependency between consecutive tokens. If you were to start computing 3 tokens simultaneously without a prediction, for token C you need to assume some exact values for tokens A and B, but those were not computed yet! But if they were speculatively predicted you can start and hope the prediction was correct.


The token is correct if it matches the one generated by the main model. It works like this:

The draft model quickly generates draft-token 1.

The main model then starts working on two tokens in parallel. It calculates token 1 based on the context, and token 2 based on the context + draft-token 1.

Once the two tokens have been generated, you can check whether the draft-token 1 from the draft model matches token 1 from the main model.

If they match, you have just calculated two tokens in the time it takes to generate one, because the calculation was done in parallel. If they do not match, delete token 2 and generate it again. Since you have already generated the correct token 1 with the big model, you can use the context + token 1 (from the main model). This takes more time, but the result is always the same.


Models do not generate tokens. They generate probabilities for each token.

Inference parameters select a token using those.

You can just select the top token all the time or you can do it probabilistically.

How you do that in both the speculative decoding and the main inference changes how likely you get the exact same tokens. And then you can choose to accept only if the token matches exactly, or you can choose to accept if it was reasonably likely to be chosen.

Let's say the main model picked the 2nd most likely token and speculative picked the most likely. You can reject that - but you get less speed up. You can accept it, you get more speed up, but you do change the output. You risk the distribution of your outputs not being what you hope.

I am simplifying. I know in https://arxiv.org/pdf/2302.01318 they specify a probability that you reject a token.


In theory, you could do that and increase the speed at higher temperatures, but it would subtly alter your output based on the draft model preferences. Rather than picking randomly from the main model probabilities, you would have to accept a draft model pick if it is close enough.

As far as I know, this is not used in practice. Currently popular implementations always match the main model output, and the draft model only affects the speed.


Here is the line in vLLM's source code that determines if a draft token is accepted:

    accepted = draft_prob > 0 and target_prob / draft_prob >= uniform_prob
It does have a branch that checks only token id equality, which is used if temperature is 0.

Good analysis. That's surprising. I always heard that the draft model doesn't affect the output in any way. It seems they do it like this to achieve faster generation. It would be interesting to investigate how this affects the output.

Edit: I haven't gone through all the code, but they might do something like this: https://arxiv.org/abs/2211.17192 where a draft model is used and the output distribution is tweaked on rejection, resulting in the exact same distribution as the main model.


I have convinced myself that it is in fact the same distribution, even if you don't get the same output on any given run. Pretty cool.

> What makes the guess "right"?

Matching token that would've been picked without speculative decoding. That seems to be more or less agreed upon.

e.g. vLLM docs list tests they run to ensure that output doesn't change if spec. decoding is used: https://github.com/vllm-project/vllm/blob/main/docs/features...

But introducing some threshold to accept other high probability tokens is interesting idea.


By "lossless" I believe they mean "stays within the target distribution". Thats what their validation test says it tests. Maybe that means there is no loss in quality in practice. I don't think it means there is no change in output.

The paper they link to in that first paragraph says you compare logits to accept or reject.


it is only "right" statistically as in conforming to the same distribution. but there is no guarantee of exact same output.

Speculative decoding batches multiple completions on all possible outcomes (0/1/2 draft tokens accepted) and sees if big model deviates at any point -- thus verifying each token. So there's no difference in output.

MTP requires a separate KV cache, so there is more memory overhead than just the weights of the MTP model, but it's a manageable amount.

From the linked post, it didn't read like a separate KV cache was needed:

> The draft models seamlessly utilize the target model's activations and share its KV cache, meaning they don't have to waste time recalculating context the larger model has already figured out.


That's great news. That has not been the case with other MTP implementations like Qwen3.5, but I see the section in the article saying Google introduced some architectural optimizations to make this possible.

It's based on taking advantage of spare compute if you have it. A tiny model generates a few steps ahead first, then the large one runs batch inference on all of those at once as if you are at that point in time. If they all check out afterwards it jumps ahead, otherwise it discards and goes onto the next one.

Not sure about this implementation, but conceptually it only works well on very capable GPUs for very predictable output. Typical speedup is about 30%, not sure how google is claiming 250% which is ridiculous.

And if you don't have enough compute, then you get negative speedup from all the extra overhead.


Memory and compute/energy overhead

I don't think this way because I like to collaborate. If a colleague can benefit from a tool I made I'm proud to save them time. I also think your attitude doesn't pass the golden rule: would you like to work on a team full of people like you?

I tend to agree with you - a rising tide lifts all boats and I want my team to be a rising tide. If I'm at a startup and I'm confident my tool is a good fit for what the rest of the team is doing and there's a genuine teamwork dynamic, oh absolutely I share things like this.

But when I've been stuck for a while in a dysfunctional team, I've definitely seen the flip side where other people will find ways to take a lot of credit for minor iterations on my work, where management will reward my productivity with high expectations and high pressure to continue the trajectory they perceive in a single idea, and when the tool becomes a support burden because too many people think it should solve all of their other problems too and I'm now perceived as being the owner of this thing they depend on.


It does seem like a highly antagonistic way of working or perhaps I'm just naive.

If your only goal is to maintain a performance lead on your peers, you either need to gain and keep an advantage or find ways to actively make your coworkers disadvantaged (or both). And if you're already doing 1) then 2) isn't a far stretch.

> would you like to work on a team full of people like you?

If their team is already like this, what choice do they have? It's a prisoners dilemma where everyone else is defecting and I'm the sole cooperator.

IMO the onus for solving this is on the business owner, either through establishing a knowledge sharing culture or more comprehensive performance evaluation that rewards these innovations.


> I don't think this way because I like to collaborate.

Nice passive aggressive dig!


Some parts of the anti-AI movement are becoming so unhinged that now any use of compute is considered an environmental threat. This degrowth mentality needs to die.

Should I reminder you what unlimited growth means and how it ends up in biology? Society/technology is no exception.

No need for unlimited growth, just normal sustainable progress like the one that allows you and me to communicate here after centuries of technological progress.

> No need for unlimited growth

Well then at some point you need to stop growing.


The "AI" craze has been very far from normal or sustainable.

Ah yes, sustainable progress, like we're doing now?

The "normal sustainable progress" has already pushed us to the brink of extinction. AI is rapidly accelerating our resource use, with nothing good to show for it.

How exactly are we "on the brink of extinction"? ("We" as in humans; many other species are obviously not as lucky.)

We are probably on the brink of very bad consequences for a signification fraction of all humans (up to and including all of them, to some extent), which is a huge problem that needs to be addressed.

But what do you gain by incorrectly labeling that as "extinction"? Because you do definitely lose credibility for it, similarly to everybody using hyperbolic language such as "boiling the oceans" etc.


If it's emissions they worry about, then it's anything emitting.

Are they against washing machines too? Or are they just grandfathered in?


This is literally why the EU mandates appliance energy efficiency.

It's never a binary thing. "Is using energy good or bad?" is a stupid question which can only provide stupid answers. It has to be placed in the context of whether it's proportionate to benefit.

Things which burn a lot of energy for little benefit - and in the case of AI, often negative benefit - end up more towards the "bad".


That's a fair point.

I hadn't considered that societies rightfully impose standards on these things.

I consider it too early to judge the cost-benefit, but it's fair that others might have already evaluated that. I rescind my comment.


Don't be disingenuous. Not all energy is created equally.

Are we back to magic water and magic soil? Does the energy have some morality attached to it?

The emissions per kWh of energy used in providing internet downloads probably is similar to that per kWh of energy used for washing clothes.


You're not seriously trying to explain that a kWh is equal to a kWh. Why not cut the crap? Are you trying to say washing clothes is of equal importance to convenience features in a browser, given that we can use each clean kWh only once? I can't tell what you truly mean like this

>a kWh is equal to a kWh

Yes, and it's none of your business how other people spend their electricity.


That's where we disagree. With our current system so reliant on fossil fuels, every kWh generated is a debt to our planet, our society.

Until that's resolved, I don't wish that debt incurred for frivolous uses.


What do you mean you "disagree"? I pay for the electricity I use and I use it however I want.

Instead of trying to control other people, why can't you start with yourself? Throw away your phone/computer. Go live in a small hut. Practice what you preach.


You read what I wrote, you just chose not to engage with it and went into an ideological creed instead.

You may pay for it, but I and the rest of the planet incur the cost.

I can go live the life of a hermit and the above will still be true.

Your electricity use puts more pollution into our air. It burns our forests. It kills species we all depend on.

No man is an island. Your actions affect others. Just paying your indulgences does not make that basic fact away.


[flagged]


[flagged]


[flagged]


Still no engagement with actual arguments brought up several posts ago at this point. Still more attempts at derailment.

Speaks for itself. I shall leave it at this then.


[flagged]


[flagged]


[flagged]


[flagged]


[flagged]


[flagged]


>until we coerce the more repugnant parts of society

Go away, troll.


You will notice I let you state your views without going absolutely deranged and resorting to ad-hominems.

Why can I not state my philosophical positions without you absolutely freaking out?

If you disagree, you should be able to articulate why. But you don't. Why?

Is it narrow-mindedness? Insecurity? Fear of debate? A nagging feeling I might be right and it would absolutely destroy your identity to admit so?


You are not paying for the total cost of the electricity you use.

You pay for a portion of it, in money.

The other portion of it is belched up into the atmosphere for future generations to pay.

You are incurring debt and forcing it upon others.


>You are incurring debt and forcing it upon others.

You seem to have no problem whatsoever with using electricity yourself. So when do you get to tell me (or anyone else) how to live? And when does it stop? Btw, this is all bizarrely dramatic since we were talking about small local models anyway.

>future generations

Yeah, and some will also say (using the same arguments) that having children is harmful to the planet and we need "measures" to limit that too.


I’m not telling you to do one thing or another. I’m taking issue with your argument that because you pay an electric bill, it follows that you can do whatever you want.

That does not follow logically for me. As humans we disagree about many things, but we generally agree that things that we do often affect others, so one way or another, we need to come together and decide which things are agreed to be acceptable and which things are not.


And I'm not inclined to entertain this nonsense, not even as a hypothetical. I'm not giving up on my most basic and fundamental rights, doubly so when these draconian restrictions won't apply to the people who want to impose them.

Why do you get to tell me (or anyone else) how to live? Why do you get to decide that burning my forest is acceptable?

Not interested, go away.

Our planet is literally dying.

The oceans are boiling [0], marine life is dying [1]. Land close to the water will be land under water soon [2]. The ice caps are melting and setting free all sorts of diseases. [3]

Large parts of our planet on fire all the time now, here's one from Australia from this year [4], but I'm sure you've read about wildfires in Australia last year, California every year, Greece last year etc etc.

What you're proposing is nothing short of a death cult. It's either degrowth or we all die, sacrificed at the altar of capitalism.

[0] https://www.theguardian.com/environment/2026/jan/09/profound...

[1] https://www.nature.com/articles/s41559-026-03013-5

[2] https://www.nature.com/articles/s43247-025-02299-w

[3] https://www.unep.org/news-and-stories/story/could-microbes-l...

[4] https://phys.org/news/2026-01-australia-declares-state-disas...?


Why do you attribute to capitalism an issue that is much more fundamental than it? People want more stuff and better lives, it's as simple as that. Even hunger/gatherer societies brought themselves to extinction multiple times in the past, and I doubt the USSR would have fared better against climate change.

Technological progress is also societal progress. If we embraced degrowth in the 1800's (there was a ton of pollution back then, and a Malthusian belief in disaster!) we might not see slavery being abolished or women being able to vote.


> People want more stuff and better lives, it's as simple as that.

Not everyone wants this at the cost of others. It's not as simple as that / not a necessary consequence of our desire to find clever solutions to solve everyday inconveniences


Because capitalism ties together better lives an ideological belief in unbounded growth.

Will people's lives really be better once they're drowning or choking on wildfire smoke? But hey, at least they had cheap junk!

It's possible to have better lives as well as societal progress without endless growth. Technological progress, too, doesn't have to mean burning our oceans. We just gotta actually think about the costs and consequences of our actions.

Not every technological development is inherently good. Sometimes the cost is not worth the result. I posit the cost of AI so far has been astronomical, higher than anything else in living memory. The results on the other hand have been rather middling.

This is my issue. A cost/benefit analysis, not a strict no to progress.


You're also dying since you were born.

Have you ever made a decision to NOT download something, turn on your computer, experiment, etc based on your perceived impact on the planet?

I mean this should (and is) be tackled at the source: 0/low emission energy generation and not consumer having to think about these decisions. Sustainable data centers using renewables etc. But not that the companies should associate/evaluate/consider bytes downloaded with environmental impact.


>not consumer having to think about these decisions

Consumers vote and advocate for what they want and don't want. There are many who say it's not an individual problem and should be dealt with broadly through regulation, then also oppose any attempts at regulation.


> this should (and is) be tackled at the source: 0/low emission energy generation and not consumer having to think about these decisions.

Until we're at that point though, the 'winners' in this market society (that wield unimaginable amounts of money = resources) such as Google could certainly think about consequences of their choices. And they usually do to some extent, I'm not saying they don't, just that electric supply and demand has two sides to it


I'm going to assume you work in tech and know the issues that come with scale.

Me, individually not doing something is gonna absolutely be drowned out by the scale of many other people not thinking of it or being incentivized against it.

This is a systemic issue. A systemic issue needs a systemic solution, not a blame shift to the individual.

We didn't get rid of lead in gas or asbestos in walls by telling people it was bad for them. We did so by banning it.


> The NHS launders money the indebted government doesn’t have into terrible health outcomes. This feels like a benefit because it conceals from patients the true cost of their care, while its shortcomings relative to other countries are noticeable only to policy nerds. That’s how most of Europe’s welfare states work.

The UK has less debt than the US and much better average health outcomes, while spending less on health per capita. This is just intellectually dishonest framing of how welfare systems work, ironically in a piece about comparative poverty.


What happens in days where renewables can't produce enough energy? Or the evenings where we don't have enough batteries (all evenings so far and for the next decade at least)? You can call it base load or whatever you want, but that energy is coming either from hydro, nuclear or a carbon-based source. And those carbons are hard to come by these days, so even if nuclear power is expensive, at least it is reliable.

It takes a decade at least for any new nuclear starting today to come online in the west. In that decade you’ve built an awful lot of batteries for the same amount of money.

No one wants to bet $10s of billions of nuke capex against the relentless progress of batteries and other tech over the next 10 years, and then the 30+ years of plant operations. It’s a suckers bet , so the only ones who can take it are nation states.


How about you answer his question?

given that we dont have nukes, and we wont for 10 years even if we started today, and we arent going to start them because theyre economic disasters...

in the medium term its going to be batteries + solar/wind + gas backups for rare weather events. If we get the total annual use of gas down to a very achievable 10% we're still massively winning climate wise. California is getting there, 45% gas in 2022, 25% gas in 2025, and adding batteries at massively increasing rate. Full coverage of an average night is within sight, using gas just for shortfalls.

We can hopefully transition the last peaking gas backup usage to something else in the long term (hydrogen? SMRs if they ever exist?) but it isnt _that_ important in the grand arc of saving the climate.


So now the discussion is not about whether base load is a thing or not, it is that you firmly believe that batteries are the answer to everything.

First it should be said that this thread is primarily about decomissioning existing nuclear power plants. It makes enormous sense to keep operating those plants until we have a world like the one you describe, regardless of how much newer plants would cost.

But more importantly, your assumptions about the future are very optimistic. I'm sure the Germans also thought they were being very smart when they decided that nuke capex was not worth it because gas was so cheap and easily available, and then now we are finding out that this decision crippled their economy because it caused a dependency. In my opinion throwing all your chips into a technology that requires materials and production capacity you don't have, and in some cases doesn't even exist yet, is a real sucker's bet. All your rosy scenarios would fall apart in one second if China decides to stop selling batteries to you.


> So now the discussion is not about whether base load is a thing or not, it is that you firmly believe that batteries are the answer to everything.

Nope, im still talking about the economics of base load. It exists insofar as there is base load _demand_, aka the minimum demand point the grid has. Base load _supply_ is not a thing - there is no rule of nature or economics that says you have to match that minimum demand with static allocation of unvarying power sources like slow thermal (coal, nukes). That worked for awhile as an economic optimization, but now on grids with variable sources like wind, solar, batteries, it doesnt work. If your plant has to run at 100% at all times to be profitable (nukes), your economic model is now broken.

> First it should be said that this thread is primarily about decomissioning existing nuclear power plants. It makes enormous sense to keep operating those plants until we have a world like the one you describe, regardless of how much newer plants would cost.

Yep, I have absolutely no objections keeping existing plants running, thats a smart thing to do. Its building new ones that doesn't make economic sense anymore.

> All your rosy scenarios would fall apart in one second if China decides to stop selling batteries to you.

true, but its easier to build a homegrown battery manufacturing industry than it is a nuclear industry.


wouldnt be so sure about this considering northvolt. And it's irrelevant anyway, the plan is, including germany, to use gas firming

> The bottleneck in the AI era is not production. It is discernment.

> The right question to ask after a vibe-coded prototype fails is not what did the AI do wrong. It is what did our process miss.

> That is a governance story, not a software story.

> The Question Is Not Adoption. It Is Readiness.

> The right question is diagnostic, not strategic.

I don't know if AI will fully replace programmers, but it has already replaced writers of this type of bullshit puff piece.


Mistral has a very difficult scenario to navigate. Training models in Europe is difficult and expensive because of regulations and energy prices. Their own open models are lagging behind the Chinese ones. That means eventually they will turn into an inference-only enterprise running mostly Chinese open models, at which point any other European player could compete (Hetzner, OVHCloud, etc.)


Well, they can train in any country they want. It's the inference and data placement that counts for legal purposes


The regulatory concerns are worldwide: the GDPR has restrictions about the territorial location of data, so you cannot move data anywhere else other than EU or "adequate" countries (in practice, the US). Since the real gold is in using data that users submitted to you (ie, GDPR protected), they are kind of stuck in regards to where they can train.

Mistral's stack already heavily relies on American cloud providers and they have tons of American investors, so its sovereignty angle is dubious anyway.


Often you have to choose the 'least bad' choice. Mistral may be that, from an EU POV.


Is this scenario far fetched? Just as Nations pay large companies to build a factory in their country, Nations will similarly pay AI companies to build a national AI model for their own consumption because AI is that beneficial.


They have some pretty cool people, though, no reason not to think they'll catch up soon enough.


It's a risk, but since they have training expertise they should be able to distill the best open source models to reach at least approximate parity comfortably. Frontier model territory looks increasingly out of reach for anyone without $100B for training and then you have to serve inference to recoup cost, that's an expensive proposition in EU.

...OTOH the cost of not sponsoring this in Europe may be complete technological obsolescence. Rock and a hard place situation.


The US single-handedly dominating AI at this point probably means a handful of tech overlords in charge of a surveillance society which depends on AI for everything, with some vague promises that everyone else will get some sort of allowance if they feel benevolent enough. For all existential risks discussed about ASI or whatever, having an oligarchy in complete control of this tech is maybe even worse.

So, I guess we all have to hope that more money does not necessarily lead to a "victory" here.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: