Garden path sentence structure trap creation relies on initial word parse error encouragement. Brain pattern recognition system default subject-verb-object order preference exploitation causes early stop interpretation failure.
Solar battery costs plummet phrase acting as complex noun modifier group creates false sentence finish illusion. Real subject findings arrival delay forces mental backtrack restart necessity.
Noun adjunct modifier stack length excess impacts processing speed negatively. Back word function switch from direction noun to support verb finalizes reader confusion state.
We write to be understood. Short sentences and simple words make the truth easy to see.
Have you tried writing into the AGENTS.md something like, "Always be on the lookout for dead code, copy-pasta, and other opportunities to optimize and trim the codebase in a sensible way."
In my experience, adding this kind of instruction to the context window causes SOTA coding models to actually undertake that kind of optimization while development carries on. You can also periodically chuck your entire codebase into Gemini-3 (with its massive context window) and ask it to write a refactoring plan; then, pass that refactoring plan back into your day-to-day coding environment such as Cursor or Codex and get it to take a few turns working away at the plan.
As with human coders, if you let them run wild "improving" things without specifically instructing them to also pay attention to bloat, bloat is precisely what you will get.
This should be a native feature of the native chat apps for all major LLM providers. There’s no reason why PII can’t be masked from the API endpoint and then replaced again when the LLM responds. “Mary Smith” becomes “Samantha Robertson” and then back to “Mary Smith” on responses from the LLM. A small local model (such as the BERT model in this project) detects the PII.
Something like this would greatly increase end user confidence. PII in the input could be highlighted so the user knows what is being hidden from the LLM.
Type-safe message-passing is such a wonderful programming paradigm - and not just for distributed applications. I remember using QNX back in the 1990s. One of its fabulous features was a C message passing library allowing you to send arbitrary binary structs from one process to another. In the context of realtime software development, you often find yourself having one process that watches for events from a certain device, modify the information somehow, and then pass it on to another process that ends up doing something else. The message-passing idiom was far superior to what was available in Linux at the time (pipes and whatnot) because you were able to work with C structs. It was not strictly type safe (as is the case with FoundationDB’s library), but for the 1990s it was pretty great.
I remnber that ASN.1 does sth similar. You'd give a ASN.1 notation to a language generator (aka producing C) and not have to worry about parsing the actual structure anymore!
Unfortunately Thrift is a dead (AKA "Apache") project and it doesn't seem like anyone since has tried to do this. It probably didn't help that there are so many gaps in that support matrix. I think "Google have made a thing! Let's blindly use it!" also helped contribute to its downfall, despite Thrift being better than Protobuf (it even supports required fields!).
Actually I just took a look at the Thrift repo and there are a surprising number of commits from a couple of people consistently, so maybe it's not quite as dead as I thought. You never hear about people picking it for new projects though.
FB maintains a distinct version of Thrift from the one they gave to Apache. fbthrift is far from dead as it's actively used across FB. However in typical FB fashion it's not supported for external use, making it open source in name (license) only.
As an interesting historical note, Thrift was inspired by Protobuf.
Very true. ASN.1 is mostly not a great fit, yet has been the choice for everything to do with certificates and telecommunication protocols (even the newer ones like 5G for things like RRC AND NGAP) Mostly for bit-level support and especially long-term stability.
* and looking back in time ASN.1 has definetly proven its LTS.
actually never heard of thrift until today, thanks for the insight :)
Honestly, first time I've seen someone praising Thrift in a long time.
Wanted to do unspeakable and evil things to people responsible to choosing it as well as its authors last time I worked on a project that used Thrift extensively.
Prompt understanding will only ever be as good as the language embeddings that are fed into the model’s input. Google’s hardware can host massive models that will never be run on your desktop GPU. By contrast, Flux and its kin have to make do with relatively tiny LLMs (Qwen Image uses a 7B-param LLM).
This post raises genuine concerns about the integration of large language models into creative and technical work, and the author writes with evident passion about what they perceive as a threat to human autonomy and craft. BUT… the piece suffers from internal contradictions, selective reasoning, and rhetorical moves that undermine its own arguments in ways worth examining carefully.
My opinion: This sort of low-evidence writing is all too common in tech circles. It makes me wish computer science and engineering majors were forced to spend at least one semester doing nothing but the arts.
The most striking inconsistency emerges in how the author frames the people who use LLM tools. Early in the piece, colleagues experimenting with AI coding assistants are described in the language of addiction and pathology: they are “sucked into the belly of the vibecoding grind,” experiencing “existential crisis,” engaged in “harmful coping.” The comparison to watching a friend develop a drinking problem is explicit and damning. This framing treats AI adoption as a personal failure, a weakness of character, a moral lapse. Yet only paragraphs later, the author pivots to acknowledging that people are “forced to use these systems” by bosses, UI patterns, peer pressure, and structural disadvantages in school and work. They even note their own privilege in being able to abstain. These two framings cannot coexist coherently. If using AI tools is coerced by material circumstances and power structures, then the addiction metaphor is not just inapt but cruel — it assigns individual blame for systemic conditions. The author wants to have it both ways: to morally condemn users while also absolving them as victims of circumstance.
This tension extends to the author’s treatment of their own social position. Having acknowledged that abstention from LLMs requires privilege, they nonetheless continue to describe AI adoption as a “brainworm” that has infected even “progressive hacker circles.” The disgust is palpable. But if avoiding these tools is a luxury, then expressing contempt for those who cannot afford that luxury is inconsistent at best and self-congratulatory at worst. The acknowledgment of privilege becomes a ritual disclaimer rather than something that actually modifies the moral judgments being rendered.
The author’s claims about intentionality represent another significant weakness. The assertion that AI systems being resource-intensive “is not a side effect — it’s the point” is presented as revelation, but it functions as an unfalsifiable claim. No evidence is offered that anyone designed these systems to be resource-hungry as a mechanism of control. The technical requirements of training large models, competitive market pressure to scale, and the emergent dynamics of venture capital investment all offer more parsimonious explanations that don’t require attributing coordinated malicious intent. Similarly, the claim that “AI systems exist to reinforce and strengthen existing structures of power and violence” is stated as though it were established fact rather than contested interpretation. This is the central claim of the piece, and yet it receives no argument — it is simply asserted and then built upon, which amounts to begging the question.
The essay also suffers from a pronounced selection bias in its examples. Every person described using AI tools is in crisis, suffering, or compromised. No one uses them mundanely, critically, or with benefit. This creates a distorted picture that serves rhetorical purposes but does not reflect the range of actual use cases. The author’s friends who share their anti-AI sentiment are mentioned approvingly, establishing clear in-group and out-group boundaries. This is identity formation masquerading as analysis — good people resist, compromised people succumb.
There is a false dichotomy running through the piece that deserves attention. The implied choice is between the author’s total abstention, not touching LLMs “with a stick,” and being consumed by the pathological grind described earlier. No middle ground exists in this telling. The possibility of critical, limited, or thoughtful engagement with these tools is never acknowledged as legitimate. You are either pure or contaminated.
Reality doesn’t work this way! It’s not black and white. My take: AI is a transformative technology and the spectrum of uses and misuses of AI is vast and growing.
The philosophical core of their argument also contains an unexamined equivocation. The author invokes the extended cognition thesis — the idea that tools become part of us and shape who we are — to make AI seem uniquely threatening. But this same argument applies to every tool mentioned in the piece: hammers, pens, keyboards, dictionaries. The author describes their own fingers “flying over the keyboard, switching windows, opening notes, looking up words in a dictionary” as part of their extended cognitive process. If consulting a dictionary shapes thought and becomes part of our cognitive process, what exactly distinguishes that from asking a language model to check grammar or suggest a word? The author never establishes what makes AI categorically different from the other tools that have already become part of us. The danger is assumed rather than demonstrated.
There is also a genetic fallacy at work in the argument about power. The author suggests AI is bad partly because of who controls it — surveillance capitalists, fascists, those with enormous physical infrastructure. But this argument conflates the origin and ownership of a technology with its inherent properties. One could make identical arguments about the printing press, the telephone, or the internet itself. The question of whether these tools could be structured differently, owned differently, or used toward different ends is never engaged. Everything becomes evidence of a monolithic system of control.
Finally, there is an unacknowledged irony in the piece’s medium and advice. The author recommends spending less time on social media and reading books instead, while writing a blog post clearly designed for social sharing, complete with the vivid metaphors, escalating moral stakes, and calls to action that characterize viral content. The post exists within and depends upon the very attention economy it criticizes. This is not necessarily hypocrisy — we all must operate within systems we find problematic — but the lack of self-awareness about it is notable given how readily the author judges others for their compromises.
The essay is most compelling when it stays concrete: the phenomenology of writing as discovery, the real pressures workers face, the genuine concerns about who controls these systems and toward what ends. It is weakest when it reaches for grand unified theories of intentional domination, when it mistakes assertion for argument, and when it allows moral contempt to override the structural analysis it claims to offer. The author clearly cares about human flourishing and autonomy, but the piece would be stronger if that care extended more generously to those navigating these technologies without the privilege of refusal.
Your reading of the addiction angle is much different than mine.
I didn't hear the author criticizing the character of their colleagues. On the contrary, they wrote a whole section on how folks are pressured or forced to use AI tools. That pressure (and fear of being left behind) drives repeated/excessive exposure. That in turn manifests as dependence and progressive atrophy of the skills they once had. Their colleagues seem aware of this as evidenced by "what followed in most of them, almost like a reflex, was a self-justification of why the way they use these tools is fine". When you're dependent on something, you can always find a 'reason'/excuse to use. AA and other programs talk about this at length without morally condemning addicts or assigning individual blame.
> For most of us, self-justification was the maker of excuses; excuses, of course, for drinking, and for all kinds of crazy and damaging conduct. We had made the invention of alibis a fine art. [...] We had to drink because at work we were great successes or dismal failures. We had to drink because our nation had won a war or lost a peace. And so it went, ad infinitum. We thought "conditions" drove us to drink, and when we tried to correct these conditions and found that we couldn't to our entire satisfaction, our drinking went out of hand
Framing something as addictive does not necessarily mean that those suffering from it are failures/weak/immoral but you seem to have projected that onto the author.
Their other analogy ("brainworm") is similar. Something that no-one would willingly sign up for if presented with all the facts up front but that slips in and slowly develops into a serious issue. Faced with mounting evidence of the problem, folks have a strong incentive to downplay the issue because it's cognitively uncomfortable and demands action. That's where the "harmful coping" comes in: minimizing the severity of the problem, avoiding the topic when possible, telling yourself or others stories about how you're in control or things will work out fine, etc.
This is a cool result. Deep learning image models are trained on enormous amounts of data and the information recorded in their weights continues to astonish me. Over in the Stable Diffusion space, hobbyists (as opposed to professional researchers) are continuing to find new ways to squeeze intelligence out of models that were trained in 2022 and are considerably out of date compared with the latest “flow matching” models like Qwen Image and Flux.
Makes you wonder what intelligence is lurking in a 10T parameter model like Gemini 3 that we may not discover for some years yet…
Stable Diffusion 1.5 is a great model for hacking on. It's powerful enough that it encodes some really rich semantics, but small and light enough that iterative hacking on it is quick enough that it can be done by hobbyists.
I've got a new potential LoRA implementation that I've been testing locally (using a transformed S matrix with frozen U and V weights from an SVD decomposition of the base matrix) that seems to work really well, and I've been playing with both changes to the forward-noising schedule and the loss functions which seem to yield empirically superior results of the standard way of doing things. Epsilon prediction may be old and busted (and working on it makes me really appreciate flow matching!) but there's some really cool stuff happening in its training dynamics that are a lot of fun to explore.
It's just a lot of fun. Great playground for both learning how these things work and for trying out new ideas.
I do (same username), but I haven't published any of this (and in fact my Github has sadly languished lately); I keep working on it with the intent to publish eventually. The big problem with models like this is that the training dynamics have so many degrees of freedom that every time I get close to something I want to publish I end up chasing down another set of rabbit holes.
https://gist.github.com/cheald/7d9a436b3f23f27b8d543d805b77f... - here's a quick dump of my SVDLora module though. I wrote it for use in OneTrainer though it should be adaptable to other frameworks easily enough. If you want to try it out, I'd love to hear what you find.
This is super cool work. I’ve built some new sampling techniques for flow matching models that encourage the model to take a “second look” by rewinding sampling to a midpoint and then running the clock forward again. This worked really well with diffusion models (pre-DiT models like SDXL) and I was curious whether it would work with flow matching models like Qwen Image. Yes, it does, but the design is different because flow matching models aren’t de-noising pixels so much as they are simply following a vector field at each step like a ship being pushed by the wind.
It seems conceptually related to ddpm/ancestral sampling, no? Except they're just adding noise to the intermediate latent to simulate a "trajectory jump". How does your method compare?
Hey, do you know how you figured out about this information? I would be super curious to keep track of current ad-hoc ways of pushing older models to do cooler things. LMK
I read that the pre-training model behind Gemini 3 has 10T parameters. That does not mean that the model they’re serving each day has 10T parameters. The online model is likely distilled from 10T down to something smaller, but I have not had either fact confirmed by Google. These are anecdotes.
My favorite benchmark is to analyze a very long audio file recording of a management meeting and produce very good notes along with a transcript labeling all the speakers. 2.5 was decently good at generating the summary, but it was terrible at labeling speakers. 3.0 has so far absolutely nailed speaker labeling.
My audio experiment was much less successful — I uploaded a 90-minute podcast episode and asked it to produce a labeled transcript. Gemini 3:
- Hallucinated at least three quotes (that I checked) resembling nothing said by any of the hosts
- Produced timestamps that were almost entirely wrong. Language quoted from the end of the episode, for instance, was timestamped 35 minutes into the episode, rather than 85 minutes.
- Almost all of what is transcribed is heavily paraphrased and abridged, in most cases without any indication.
Understandable that Gemini can't cope with such a long audio recording yet, but I would've hoped for a more graceful/less hallucinatory failure mode. And unfortunately, aligns with my impression of past Gemini models that they are impressively smart but fail in the most catastrophic ways.
I wonder if you could get around this with a slightly more sophisticated harness. I suspect you're running into context length issues.
Something like
1.) Split audio into multiple smaller tracks.
2.) Perform first pass audio extraction
3.) Find unique speakers and other potentially helpful information (maybe just a short summary of where the conversation left off)
4.) Seed the next stage with that information (yay multimodality) and generate the audio transcript for it
Obviously it would be ideal if a model could handle the ultra long context conversations by default, but I'd be curious how much error is caused by a lack of general capability vs simple context pollution.
I'd do the transcript and the summary parts separately. Dedicated audio models from vendors like ElevenLabs or Soniox use speaker detection models to produce an accurate speaker based transcript while I'm not necessarily sure that Google's models do so, maybe they just hallucinate the speakers instead.
I just tried "analyze this audio file recording of a meeting and notes along with a transcript labeling all the speakers" (using the language from the parent's comment) and indeed Gemini 3 was significantly better than 2.5 Pro.
3 created a great "Executive Summary", identified the speakers' names, and then gave me a second by second transcript:
[00:00] Greg: Hello.
[00:01] X: You great?
[00:02] Greg: Hi.
[00:03] X: I'm X.
[00:04] Y: I'm Y.
...
I made a simple webpage to grab text from YouTube videos:
https://summynews.com
Great for this kind of testing?
(want to expand to other sources in the long run)
Subject (((((Solar battery) costs) plummet) analysis) findings)
Verb [back]
Object (anytime (electricity availability))
Garden path sentence structure trap creation relies on initial word parse error encouragement. Brain pattern recognition system default subject-verb-object order preference exploitation causes early stop interpretation failure.
Solar battery costs plummet phrase acting as complex noun modifier group creates false sentence finish illusion. Real subject findings arrival delay forces mental backtrack restart necessity.
Noun adjunct modifier stack length excess impacts processing speed negatively. Back word function switch from direction noun to support verb finalizes reader confusion state.
We write to be understood. Short sentences and simple words make the truth easy to see.
reply