More

musebox35 · 2026-05-05T17:07:09 1778000829

Debating how not to use AI will not get anyone anywhere since negative framing almost never works with humans (it also does not work with llms). Let’s concentrate on how to build closed loop systems that verify the llm output, how to manage context, and how to build failsafes around agentic systems and then and only then we might start to make progress.

musebox35 · 2026-05-03T12:16:46 1777810606

Not all parts of the code is equal in this respect. Those parts pertaining to the user visible portion (API of a library, command args of a CLI, UI of a GUI/TUI app, endpoints in a web service, etc.) are closely related to the spec. The rest is more fluid as long as it does not change user visible behavior. The choices still affect maintenance and debugging costs, so there is some pressure to not YOLO these portions. I think the most difficult design decisions relate to how to separate the two and how to ensure a smooth evolution of both user facing and programmer facing design decisions.

What is different now is that maintainability and debugging design decisions were made w.r.t. human coders or teams in the past which is not necessarily the case anymore. Should we just specify the API and let agents figure the rest or do we still want to control the rest to ensure maintenance and security? A year ago I definitely thought so. Now it is more murky as the agents are faster browsers of codebases and can explore runtime effects faster than I can type and parse output. Strongest empirical observations depend on the runtime behavior so they have an edge there.

musebox35 · 2026-04-26T19:19:22 1777231162

I am rereading the Asimov robot novels. A decrease in human to human interaction is a major side effect that he has foreseen. Decreasing interaction and collaboration are some of the core themes.

musebox35 · 2026-04-24T23:14:26 1777072466

I think this question is one of the more concrete and practical ways to attack the problem of understanding transformers. Empirically the current architecture is the best to converge training by gradient descent dynamics. Potentially, a different form might be possible and even beneficial once the core learning task is completed. Also the requirements of iterated and continuous learning might lead to a completely different approach.

musebox35 · 2026-04-24T23:03:09 1777071789

Thanks for posting a through and accurate summary of the historical picture. I think it is important to know the past trajectory to extrapolate to the future correctly.

For a bit more context: Before 2012 most approaches were based on hand crafted features + SVMs that achieved state of the art performance on academic competitions such as Pascal VOC and neural nets were not competitive on the surface. Around 2010 Fei Fei Li of Stanford University collected a comparatively large dataset and launched the ImageNet competition. AlexNet cut the error rate by half in 2012 leading to major labs to switch to deeper neural nets. The success seems to be a combination of large enough dataset + GPUs to make training time reasonable. The architecture is a scaled version of ConvNets of Yan Lecun tying to the bitter lesson that scaling is more important than complexity.

musebox35 · 2026-04-24T19:26:54 1777058814

They say that they did test but the coverage was not enough to pick it up, at least for the prompt change:

“ After multiple weeks of internal testing and no regressions in the set of evaluations we ran, we felt confident about the change and shipped it alongside Opus 4.7 on April 16.

As part of this investigation, we ran more ablations (removing lines from the system prompt to understand the impact of each line) using a broader set of evaluations. One of these evaluations showed a 3% drop for both Opus 4.6 and 4.7. We immediately reverted the prompt as part of the April 20 release.”

Considering the number and scope of users they serve, I can sympathize with the difficulty. However, they should reimburse affected users at least partially instead of just announcing “our bad, sorry “. That would reduce the frustration.

baxtr · 2026-04-24T20:59:57 1777064397

Naively, one could assume that with AI it should be possible to create a long and broad list of test cases…

musebox35 · 2026-04-24T04:28:54 1777004934

I attended the related session at Next’26 yesterday. From my understanding it is a new backend and they will release the torch tpu source on github in one or two months. It will not support all ops initially but they are moving fast. Still for a while torchax is mature enough to run torch models on tpus by translating to jax.

musebox35 · 2026-03-13T15:31:59 1773415919

My ancient boxed copy of Visual Basic for DOS 1.0 that supported mouse clicks on TUI buttons would have found your viewpoint quite offensive if it had any AI in it ;-) Oh boy, good old days.

musebox35 · 2026-02-20T12:50:01 1771591801

Similar trend in open text-to-image models: Flux.1 was 12B but now we have 6B models with much better quality. Qwen Image goes from 20B to 7B while merging the edit line and improving quality. Now that the cost of spot H200s at 140GB came down to A100 levels, you can finally try larger scale finetuning/distillation/rl with these models. Very promising direction for open tools and models if the trend continues.

musebox35 · 2026-01-04T08:57:53 1767517073

I guess, the sense of accomplishment is very person dependent. I enjoy programming a lot, but it is easy to find people who would challenge themselves to scale the said website to a million users/X view per day. I don't know the why, probably there is no fixed meaning to existence and nature likes diversity.

For me, the fun in programming also depends a lot on the task. Recently, I wanted to have Python configuration classes that can serialize to yaml, but I also wanted to automatically create an ArgumentParser that fills some of the fields. `hydra` from meta does that but I wanted something simpler. I asked an agent for a design but I did not like the convoluted parsing logic it created. I finally designed something by hand by abusing the metadata fields of the dataclass.field calls. It was deeply satisfying to get it to work the way I wanted.

But after that, do I really want to create every config class and fill every field by myself for the several scripts/classes that I planned to use? Once the initial template was there, I was happy to just guide the agent to fill in the boilerplate.

I agree that we should keep the fun in programming/art, but how we do that depends on the what, the who, and the when.