I haven't shouted into the void for a while. Today is as good a day as any other...

coolfox · 2025-09-30T05:41:17 1759210877

> A lack of determinism comes from many places, but primarily: 1) The models change 2) The models are not deterministic...

models themselves are deterministic, this is a huge pet peeve of mine, so excuse the tangent, but the appearance of nondeterminism comes from a few sources, but imho can be largely attributed to the probabilistic methods used to get appropriate context and enable timely responses. here's an example of what I mean, a 52-card deck. The deck order is fixed once you shuffle it. Drawing "at random" is a probabilistic procedure on top of that fixed state. We do not call the deck probabilistic. We call the draw probabilistic. Another exmaple, a pot of water heating on a stove. Its temperature follows deterministic physics. A cheap thermometer adds noisy, random error to each reading. We do not call the water probabilistic. We call the measurement probabilistic.

Theoretical physicists run into such problems, albeit far more complicated, and the concept for how they deal with them is called ergodicity. The models at the root of LLM's do exhibit ergodic behavior; the time average and the ensemble average of an observable are identical, i.e. the average response of a single model over a long duration and the average of many similar models at a fixed moment are equivalent.

quietbritishjim · 2025-09-30T14:26:21 1759242381

The previous poster is correct for a very slightly different definition of the word "model". In context, I would even say their definition is the more correct one.

They are including the random sampler at the end of the LLM that chooses the next token. You are talking about up to, but not including, that point. But that just gives you a list of possible output tokens with values ("probabilities"), not a single choice. You can always just choose the best one, or you could add some randomness that does a weighted sample of the next token based on those values. From the user's perspective, that final sampling step is part of the overall black box that is running to give an output, and it's fair to define "the model" to include that final random step.

coolfox · 2025-10-03T08:43:07 1759480987

but, to be fair, simply calling the sampler random is what gives people the impression like what OP is complaining about. which isn't entirely accurate, it's actually fairly bounded.

this plays back into my original comment, which you have to understand to know that the sampler, for all its "randomness" should only be seeing and picking from a variety of correct answers, i.e. the sample pool should only have all the acceptable answers to "randomly" pick from. so when there are bad or nonsensical answers that are different every time, it's not because the models are too random, it's because they're dumb and need more training. tweaking your architecture isn't going to fully prevent that.

lkey · 2025-09-30T14:02:45 1759240965

The User:

The stove keeps burning me because I can't tell how hot it is, it feels random and the indicator light it broken.

You:

The most rigorous definition of temperature is that it is equal to the inverse of the rate of change of entropy with respect to internal energy, within a given volume V and particles N held constant. All accessible microstates are equiprobable over a long period of time, this is the very definition of ergodicity! Yet, because of the flow of entropy the observed macrostates will remain stable. Thus, we can say the the responses of a given LLM are...

The User:

I'm calling the doctor, and getting a new stove with an indicator light.

coolfox · 2025-10-03T08:25:13 1759479913

Well really, the reason why I gripe about it, to use your example, is that then they believe the indicator light malfunctioning is an intrinsic feature of stoves, so they throw their stove out and start cooking over campfires instead, tried and true, predictable, whatever that means.

I think my deck of cards example still holds.

You could argue I'm being uselessly pedantic, that could totally be the case, but personally I think that's cope to avoid having to think very hard.

hackernewds · 2025-09-30T15:36:32 1759246592

Here is a definite scientific nail down and solve for non-determinism in LLM outputs (Mira Murati's new outfit but really credit the author)

https://bff531bb.connectionism.pages.dev/blog/defeating-nond...

gwern · 2025-09-30T18:32:12 1759257132

Requires a login?

nulltype · 2025-10-02T09:11:36 1759396296

Looks like it's probably https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

mwiesenthal · 2025-09-30T19:22:15 1759260135

It's also a pet peeve of mine, enough that I actually wrote a blog about it

https://hi-mil.es/blog/human-slop-vs-ai-slop

alex77456 · 2025-09-30T00:10:22 1759191022

I share the sentiment. I would add that people I would like to see use LLMs for coding (and other technical purposes) tend to be jaded like you, and people I personally wouldn't want to see use LLMs for that, tend to be pretty enthusiastic

stillsut · 2025-09-30T14:20:18 1759242018

I've been building something like this, a markdown that tracks your prompts, and the code generated.

https://github.com/sutt/innocuous/blob/master/docs/dev-summa...

Check it out, I'd be curious of your feedback.

harpiaharpyja · 2025-10-04T04:04:35 1759550675

Maybe just take a weekend and build something by writing the code yourself. It's the feeling of pure creative power, it sounds like you've just forgotten what it was like.

827a · 2025-09-30T05:43:09 1759210989

Yeah, tbh I used to be a bit agentic coding tool-pilled, but over the past four months I've come to realize that if this industry evolves in a direction where I don't actually get to write code anymore, I'm just going to quit.

Code is the only good thing about the tech industry. Everything else is capitalist hellscape shareholder dystopia. Thinking on it, its hilarious that any self-respecting coder is excited about these tools, because what you're excited for is a world where, now, at best, your entire job is managing unpredictable AI agents while sitting in meetings all day to figure out what to tell your AI agents to build. You don't get to build the product you want. You don't get to build it how you want. You'll be a middle manager that gets to orchestrate the arguments between the middle manager you already had and the inflexible computer.

You don't have to participate in a future you aren't interested in. The other day my boss asked me if I could throw Cursor at some task we've had backlogged for a while. I said "for sure my dude" then I just did it myself. It took me like four hours, and my boss was very impressed with how fast Cursor was able to do it, and how high quality the code was. He loves the Cursor metrics dashboard for "lines accepted" or whatever, every time he screenshares he has that tab open, so sometimes I task it on complicated nonsense tasks then just throw away the results. Seeing the numbers go up makes him happy, which makes my life easier, so its a win-win. Our CTO is really proud of "what percentage of our code is AI written" but I'm fairly certain that even the engineers who use it in earnest actually commit, like, 5% of what Cursor generates (and many do not use it in earnest).

The sentiment shift I've observed among friends and coworkers has been insane over the past two months. Literally no one cares about it anymore. The usage is still there, but its a lot more either my situation or just a "spray and pray" situation that creates a ton of disillusioned water cooler conversations.

thement · 2025-10-01T19:08:46 1759345726

This pretty much sums up my experience.

johnfn · 2025-09-30T00:00:29 1759190429

If you care about this so much why don't you use one of the open source OpenAI models? They're pretty good and give you the guarantees you want.

int_19h · 2025-09-30T00:20:44 1759191644

None of the open weight models are really as good as SOTA stuff, whatever their evals says. Depending on the task at hand this might not actually manifest if the task is simple enough, but once you hit the threshold it's really obvious.

genidoi · 2025-09-30T03:02:06 1759201326

> where I feel so disconnected from my codebase I'd rather just delete it than continue.

If you allow your codebase to grow unfamiliar, even unrecognisable to you, that's on you, not the AI. Chasing some illusion of control via LLM output reproducibility won't fix the systemic problem of you integrating code that you do not understand.

wilg · 2025-09-30T06:01:44 1759212104

Who cares about the blame, it would just be useful if the tools were better at this task in many particular ways.

Panoramix · 2025-09-30T06:53:16 1759215196

It's not blame, it's useful feedback. For a large application you have to understand what different parts are doing and how everything is put together, otherwise no amount of tools will save you.

lkey · 2025-09-30T14:08:32 1759241312

The process of writing the code, thinking all the while, is how most humans learn a codebase. Integrating alien code sequentially disrupts this process, even if you understand individual components. The solution is to methodically work through the codebase, reading, writing, and internalizing its structure, and comparing that to the known requirements. And yet, if this is always required of you as a professional, what value did the LLM add beyond speeding up your typing while delaying the required thinking?

manofmanysmiles · 2025-09-30T04:40:11 1759207211

I completely agree.

jstummbillig · 2025-09-30T06:37:24 1759214244

And now imagine you'd have to rely on humans to build your software instead

manofmanysmiles · 2025-10-01T00:32:34 1759278754

This is the question though isn't it?

With sufficient structure and supervision, will a "team" of agents out-perform a team of humans?

Military, automotive and other industries have developed rigorous standards consisting of among other things detailed processes for developing software.

Can there be an AI waterfall? With sufficiently unambiguous, testable requirements, and a nice scaffolding of process, is it possible to achieve the dream of managers, and eliminate software engineers? My intuition is evenly split.