Just popping in here because people seem to be surprised by
> I build on the exact hardware I intend to deploy my software to and ship it to another machine with the same specs as the one it was built on.
This is exactly the use case in HPC. We always build -march=native and go to some trouble to enable all the appropriate vectorization flags (e.g., for PowerPC) that don't come along automatically with the -march=native setting.
Every HPC machine is a special snowflake, often with its own proprietary network stack, so you can forget about binaries being portable. Even on your own machine you'll be recompiling your binaries every time the machine goes down for a major maintenance.
Please don't take up space in the comment section with accusations. You can report this at the email below and the mods will look at it:
> Please don't post insinuations about astroturfing, shilling, brigading, foreign agents, and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email [email protected] and we'll look at the data.
I find it kind of helpful and interesting to see a subset of these called out with a bit of data. Helps keep my LLM detector trained (the one in my brain, that is) and I think it helps a little about expressing the community consensus against this crap. In this case, I'm glad the GP posted something, as it's definitely not mistaken.
Without an empirical methodology it's hard to know how true this is. There are known and well-documented human biases (e.g., placebo effect) that could easily be involved here. And besides that, there's a convincing (but often overlooked on HN) argument to be made that modern LLMs are optimized in the same manner as other attention economy technologies. That is to say, they're addictive in the same general way that the YouTube/TikTok/Facebook/etc. feed algorithms are. They may be useful, but they also manipulate your attention, and it's difficult to disentangle those when the person evaluating the claims is the same person (potentially) being manipulated.
I'd love to see an empirical study that actually dives into this and attempts to show one way or another how true it is. Otherwise it's just all anecdotes.
At least in some instances you could frame it that way: You believe that doctors and medicine are effective at treating disease, so when you are sick and a doctor gives you a bottle of sugar pills and you take them, you now interpret your state through the lens that you should feel better. A bias on how you perceive your condition
That's not all that the placebo effect is. But it's probably the aspect that best fits the framing as bias
It depends highly on the field. In history, sure. The point of getting a history PhD is to become a history professor, and you can't do that if you don't get the PhD, and meanwhile history PhDs don't meaningfully open up any other job prospects, so attempting and failing to get a PhD provides negative value.
In CS and many engineering disciplines, there is a long history of people dropping out of PhDs and landing in industry. The industry is therefore much more accustomed to, and therefore accommodating to, people taking this path. Whether it's a maximally efficient use of time is another question, but it's certainly not wasted effort.
But I do agree that it's stressful nonetheless because it still feels like a failure even if it is not actually in reality. I wrote about this when I put down my own PhD journey here [1]. In particular after the control replication (2017) paper, I very nearly quit out of academia entirely despite it being my biggest contribution to the field by far.
For context I've been an AI skeptic and am trying as hard as I can to continue to be.
I honestly think we've moved the goalposts. I'm saying this because, for the longest time, I thought that the chasm that AI couldn't cross was generality. By which I mean that you'd train a system, and it would work in that specific setting, and then you'd tweak just about anything at all, and it would fall over. Basically no AI technique truly generalized for the longest time. The new LLM techniques fall over in their own particular ways too, but it's increasingly difficult for even skeptics like me to deny that they provide meaningful value at least some of the time. And largely that's because they generalize so much better than previous systems (though not perfectly).
I've been playing with various models, as well as watching other team members do so. And I've seen Claude identify data races that have sat in our code base for nearly a decade, given a combination of a stack trace, access to the code, and a handful of human-written paragraphs about what the code is doing overall.
This isn't just a matter of adding harnesses. The fields of program analysis and program synthesis are old as dirt, and probably thousands of CS PhD have cut their teeth of trying to solve them. All of those systems had harnesses but they weren't nearly as effective, as general, and as broad as what current frontier LLMs can do. And on top of it all we're driving LLMs with inherently fuzzy natural language, which by definition requires high generality to avoid falling over simply due to the stochastic nature of how humans write prompts.
Now, I agree vehemently with the superficial point that LLMs are "just" text generators. But I think it's also increasingly missing the point given the empirical capabilities that the models clearly have. The real lesson of LLMs is not that they're somehow not text generators, it's that we as a species have somehow encoded intelligence into human language. And along with the new training regimes we've only just discovered how to unlock that.
> I thought that the chasm that AI couldn't cross was generality. By which I mean that you'd train a system, and it would work in that specific setting, and then you'd tweak just about anything at all, and it would fall over. Basically no AI technique truly generalized for the longest time.
That is still true though, transformers didn't cross into generality, instead it let the problem you can train the AI on be bigger.
So, instead of making a general AI, you make an AI that has trained on basically everything. As long as you move far enough away from everything that is on the internet or are close enough to something its overtrained on like memes it fails spectacularly, but of course most things exists in some from on the internet so it can do quite a lot.
The difference between this and a general intelligence like humans is that humans are trained primarily in jungles and woodlands thousands of years ago, yet we still can navigate modern society with those genes using our general ability to adapt to and understand new systems. An AI trained on jungles and woodlands survival wouldn't generalize to modern society like the human model does.
And this makes LLM fundamentally different to how human intelligence works still.
Iteration is inherent to how computers work. There's nothing new or interesting about this.
The question is who prunes the space of possible answers. If the LLM spews things at you until it gets one right, then sure, you're in the scenario you outlined (and much less interesting). If it ultimately presents one option to the human, and that option is correct, then that's much more interesting. Even if the process is "monkeys on keyboards", does it matter?
There are plenty of optimization and verification algorithms that rely on "try things at random until you find one that works", but before modern LLMs no one accused these things of being monkeys on keyboards, despite it being literally what these things are.
Of course it doesn't matter indeed. What I was hinting at is if you forget all the times the LLM was wrong and just remember that one time it was right it makes it seem much more magical than it actually might be.
Also how were the data races significant if nobody noticed them for a decade ? Were you all just coming to work and being like "jeez I dont know why this keeps happening" until the LLM found them for you?
I agree with your points. Answering your one question for posterity:
> Also how were the data races significant if nobody noticed them for a decade ?
They only replicated in our CI, so it was mainly an annoyance for those of us doing release engineering (because when you run ~150 jobs you'll inevitably get ~2-4 failures). So it's not that no one noticed, but it was always a matter of prioritization vs other things we were working on at the time.
But that doesn't mean they got zero effort put into them. We tried multiple times to replicate, perhaps a total of 10-20 human hours over a decade or so (spread out between maybe 3 people, all CS PhDs), and never got close enough to a smoking gun to develop a theory of the bug (and therefore, not able to develop a fix).
To be clear, I don't think "proves" anything one way or another, as it's only one data point, but given this is a team of CS PhDs intimately familiar with tools for race detection and debugging, it's notable that the tools meaningfully helped us debug this.
It's not just about the increase in volume, it's about the delta between the prompt and the generation.
If the generation merely restates the prompt (possibly in prettier, cleaner language), then usually it's the case that the prompt is shorter and more direct, though possibly less "correct" from a formal language perspective. I've seen friends send me LLM-generated stuff and when I asked to see the prompt, the prompts were honestly better. So why bother with the LLM?
But if you're using the LLM to generate information that goes beyond the prompt, then it's likely that you don't know what you're talking about. Because if you really did, you'd probably be comfortable with a brief note and instructions to go look the rest up on one's own. The desire to generate more comes from either laziness or else a desire to inflate one's own appearance. In either case, the LLM generation isn't terribly useful since anyone could get the same result from the prompt (again).
So I think LLMs contribute not just to a drowning out of human conversation but to semantic drift, because they encourage those of us who are less self-assured to lean into things without really understanding them. A danger in any time but certainly one that is more acute at the moment.
It's that, but it's also that the incentives are misaligned.
How many supposed "10x" coders actually produced unreadable code that no one else could maintain? But then the effort to produce that code is lauded while the nightmare maintenance of said code is somehow regarded as unimpressive, despite being massively more difficult?
I worry that we're creating a world where it is becoming easy, even trivial, to be that dysfunctional "10x" coder, and dramatically harder to be the competent maintainer. And the existence of AI tools will reinforce the culture gap rather than reducing it.
It's a societal problem we are just seeing the effects in computing now. People have given up, everything is too much, the sociopaths won, they can do what they want with my body mind and soul. Give me convenience or give me death.
Not the same industry but at least one literary agent does this: if you physically print and mail your book proposal, they will respond with a short but polite, physical rejection letter if they reject you.
But I think it's a generational thing. The younger agents I know of just shut down all their submissions when they get overwhelmed, or they start requiring everyone to physically meet them at a conference first.
> I build on the exact hardware I intend to deploy my software to and ship it to another machine with the same specs as the one it was built on.
This is exactly the use case in HPC. We always build -march=native and go to some trouble to enable all the appropriate vectorization flags (e.g., for PowerPC) that don't come along automatically with the -march=native setting.
Every HPC machine is a special snowflake, often with its own proprietary network stack, so you can forget about binaries being portable. Even on your own machine you'll be recompiling your binaries every time the machine goes down for a major maintenance.
reply