I have sometimes found "LARPing job roles" to be useful for expectations for the codebase.
Claude is kind of decent at doing "when in Rome" sort of stuff with your codebase, but it's nice to reinforce, and remind it how to deploy, what testing should be done before a PR, etc.
Regardless, published papers aren't an authoritative source of truth. Just a note to your friends "hey I did some cool stuff I want to tell you about!"
Sure it's slightly more reviewed than a GitHub repo, but it's not an end all be all.
> The vast majority of Linux kernel performance improvement patches probably have way less of a real world impact than this.
unlikely given that the number they are multiplying by every improvement is far higher than "times jq is run in some pipeline". Even 0.1% improvement in kernel is probably far far higher impact than this
It's not just that - anything working with analog signals benefits hugely from not living inside the complete EM interference nightmare of the computer case.
I see AI pass the turning test all the time, since humans are constantly falsely being accused of being an AI.
It doesn't mean that AI got good, just that humans are thinking other humans are AI, which is a form of passing the test.
The adversarial version with humans involved is actually easier to pass because of this - because real actual humans wouldn't pass your non adversarial version.
I've seen a fair number of cases where someone swears up and down not to be using AI to generate responses, but there's no good reason to believe it (except perhaps specifically for the messages where that claim is made).
This includes times that someone basically disappeared from e.g. Stack Overflow at some point before the release of ChatGPT, having written a bunch of posts that barely demonstrate functional literacy or comprehension of English; and then came back afterward posting long messages with impeccable grammar and spelling in textbook "LLM house style".
It's not just patterns like "not just X, but Y", but also deeper patterns and a kind of narrative cadence. Sure it's also mimicking something real, but usually it's a mismatch between the insightfulness of the content and the quality of the delivery. It feels like chewing on empty calories, it's missing the intentionality and the edge of being human. I guess you need to read a lot of LLM output to get a feel for this beyond the surface level pattern matching.
I wonder whether AI house style is the result of the people training it having no sense of writing style or some kind of technical limitation.
With AI, there is no sense of the level of emphasis matching the meaning of the text, or a long-range dramatic arc - everything is a revelation, like somebody who can only speak in TED talks. Everything is extremely earnest, very important, and presented using the same five flashy language hacks.
It was a joke. But also my use of not x but y is not rhetorical but declarative. The whole point is that what many of us are talking about is not simply these surface patterns but how they are used and how the narrative rhythm of the sentences and paragraphs go.
I believe that, much the same way as a fighter jet designed for the average pilot doesn't fit any of them, the 'average' of written text ends up reading like an LLM without being able to find a 100% matching sample.
> I certainly used em dashes before 2022, and so did anyone who cares about proper typography.
Who said anything about em-dashes? There's an entire Wikipedia page documenting the tells, and only one of the 15 or so items is "extended usage of em-dashes".
I don't think there's any definitive way to check, but for me one of the biggest tells that a long piece of writing was LLM generated is that it will hardly say anything given how many words are in it.
(well that and the "it's not just x, it's y!" pattern they seem to love)
But it's also often a shoehorned artificial contrast that doesn't really make sense. The Y is often not such a different thing from the X that would make it worthy an actual "not just X but Y" claim. Or the Y is a vague subjective term, or some kind of fancy-word-dropping. It's strong styling but little content, similar to politician CYA talk. I don't think it's necessarily a tech limitation, more of an effect of deliberate post-training to be middle-of-the-road nonoffensive and nonopinionated.
In one study, GPT-4.5 was judged to be human 73% of the time, which means that the actual human was judged to be human only 27% of the time. More human than human, as Tyrell would say.
Edit: folks, the standard Turing test involves a computer and a human, and then a judge communicating with both and giving a verdict about which one is the human. The percentages for the two entities being judged will add up to exactly 100%. That's how this test was conducted. Please don't assume I'm a moron.
The implication would be that GPT-4.5 was not judged to be human 27% of the time. You can't determine how often humans were judged correctly as humans from that data point.
No one should have nuclear weapons, we aught to have robust policy, institutions, and vigilance to prevent their proliferation and use.
Computerized vehicles aught to be strictly regulated in terms of how computers may affect the physical operation of the car, such that a reasonable standard of safety can be ensured outside the usual risk one takes when hopping in a motor vehicle. The fact that a hacker can possibly kill people by rooting an infotainment system is a symptom of the general disregard for security in design, and we continue to ignore it for engineering expediency.
Claude is kind of decent at doing "when in Rome" sort of stuff with your codebase, but it's nice to reinforce, and remind it how to deploy, what testing should be done before a PR, etc.
reply