You personally wouldn't use live captions and dubbing, so there's no point building it for the millions of people who need it as an accessibility feature?
I've seen a few of this type of thing pop up in search results ("DeepWiki" by Cognition.) I'm not a fan. It is just LLM contentslop, basically. Actual wikis written by humans are made of actual insight from developers and consumers. "We intend you use it in X way", "If you encounter Y issue, do Z." etc. Look at arch wiki. Peak wiki-style documentation, LLMs could never recreate. Well, maybe with a future iteration of the technology they can be useful. But for now, you do not gain much by essentially restating code, API interfaces, and tests in prose. They take up space from legitimate documentation and developer instruction in search results.
I think this wound up being close enough to true, it's just that it actually says less than what people assumed at the time.
It's basically the Jevons paradox for code. The price of lines of code (in human engineer-hours) has decreased a lot, so there is a bunch of code that is now economically justifiable which wouldn't have been written before. For example, I can prompt several ad-hoc benchmarking scripts in 1-2 minutes to troubleshoot an issue which might have taken 10-20 minutes each by myself, allowing me to investigate many performance angles. Not everything gets committed to source control.
Put another way, at least in my workflow and at my workplace, the volume of code has increased, and most of that increase comes from new code that would not have been written if not for AI, and a smaller portion is code that I would have written before AI but now let the AI write so I can focus on harder tasks. Of course, it's uneven penetration, AI helps more with tasks that are well-described in the training set (webapps, data science, Linux admin...) compared to e.g. issues arising from quirky internal architecture, Rust, etc.
At an individual level, I think it is for some people. Opus/Sonnet 4.5 can tackle pretty much any ticket I throw at it on a system I've worked on for nearly a decade. Struggles quite a bit with design, but I'm shit at that anyway.
It's much faster for me to just start with an agent, and I often don't have to write a line of code. YMMV.
Sonnet 3.7 wasn't quite at this level, but we are now. You still have to know what you're doing mind you and there's a lot of ceremony in tweaking workflows, much like it had been for editors. It's not much different than instructing juniors.
Roughly, this is the Electronic Frontier Foundation (and comparable lobbying orgs in other countries.) However, an org like this doesn't have much power to compel individuals to give them $1.
LLM argumentative essays tend to have this "gish-gallop" energy; say a bunch of tenuously related and vaguely supported things, leave the reader wondering if it was the author who failed to connect the dots, or them
My wife wanted a sapphire and we met during Ph D research. It's straight up not possible to pay more then like, a dollar for a synthetic sapphire so that's what's in her ring.
I don't know if it matters. Even if the best we can do is get really good at interpolating between solutions to cognitive tasks on the data manifold, the only economically useful human labor left asymptotes toward frontier work; work that only a single-digit percentage of people can actually perform.
The shared reward function from pre-training is like primary school for an LLM. Maybe RLHF is like secondary school. The governor can be differentiated from the workers with different system and user prompts, fine tuning, etc., which might be similar to medical school or law school for a human.
Certainly human judges, attorneys for defense and prosecution, and members of the jury can still perform their jobs well even if they attended the same primary and secondary schools.
I see what you are getting at. My point is that if you train and agent and verifier/governor together based on rewards from e.g. RLVR, the system (agent + governor) is what will reward hack. OpenAI demonstrated this in their "Learning to Reason with CoT" blog post, where they showed that using a model to detect and punish strings associated with reward hacking in the CoT just led the model to reward hack in ways that were harder to detect. Stacking higher and higher order verifiers maybe buys you time, but also increases false negative rates + reward hacking is a stable attractor for the system.
reply