More

gnat · 2026-05-03T07:52:14 1777794734

Nice! Your spec-maxxing is very resonant. I've been doing working with explicit requirements: elicit them from conversation with me or introspecting another piece of software; one-shot from them; and keep them up-to-date as I do the "old man shouts at Claude" iterations after whatever one-shotting came up with.

Unlike you, I wish for the LLM to do as much of the work as possible -- but "as possible" is doing a lot of work in that sentence. I'm still trying to get clear on exactly where I am needed and where Opus and iterations will get there eventually.

It has really challenged me to get clearer on what a requirement is vs a constraint (e.g., "you don't get to reinvent the database schema, we're building part of a larger system"). And I still battle with when and how to specify UI behaviours: so much UI is implicit, and it seems quite daunting to have to specify so much to get it working. I have new respect for whoever wrote the undoubtedly bajillion tests for Flutter and other UI toolkits.

gnat · 2026-05-03T08:02:40 1777795360

Forgot to add: I get several benefits from doing this.

1. Specifications that live outside the code. We have a lot of code for which "what should this do?" is a subjective answer, because "what was this written to do?" is either oral legend or lost in time. As future Claude sessions add new features, this is how Claude can remember what was intentional in the existing code and what were accidents of implementation. And they're useful for documenters, support, etc.

2. Specifications that stay up to date as code is written. No spec survives first contact with the enemy (implementation in the real world). "Huh, there are TWO statuses for Missing orders, but we wrote this assuming just one. How do we display them? Which are we setting or is it configurable?" etc. Implementer finds things the specifier got wrong about reality, things the specifier missed that need to be specified/decided, and testing finds what they both missed.

I have a colleague working on saving architecture decisions, and his description of it feels like a higher-abstraction version of my saving and maintaining requirements.

try-working · 2026-05-03T10:41:59 1777804919

Specifications doesn't tell you what to do, they say what the end state should be. In between that you need a codebase analysis step and an implementation plan.

My recursive-mode workflow handles all of that and more and gives you full traceability: https://recursive-mode.dev/introduction

energy123 · 2026-05-03T08:13:14 1777795994

I do (1) the same but (2) differently. In my workflow, (2) are AI generated specs using human written (1) as the input. It's an intermediate stage between (1) and the codebase, allowing for a gradual token expansion from 30k to 250k to the final code which is 2-3M. The benefit I've found with this approach is it gives the AI a way to iterate on the details of whole system in one context window, whereas fitting the whole codebase into one prompt is impossible. The code is then nothing more than a style transfer from (2).

hansmayer · 2026-05-03T09:15:02 1777799702

Let's cut through the noise - what did you build with this very elaborate process and how much ARR is it generating ?

rrgok · 2026-05-03T10:23:20 1777803800

Asking the real questions. I would also really like know how much value AIs are bringing in terms of ARR or MRR.

kennyloginz · 2026-05-03T09:05:51 1777799151

gnat · 2026-03-08T00:47:30 1772930850

I'm not trying to convert you, just want to share process tips that I see working for me and others. We're using agents, not a chat, because they can do complex work in pursuit of a goal.

1. Make artifacts. If you're doing research into a tech, or a hypothesis, then fire off subagents to explore different parts of the problem space, each reporting back into a doc. Then another agent synthesizes the docs into a conclusion/report.

2. Require citations. "Use these trusted sources. Cite trusted sources for each claim. Cite with enough context that it's clear your citations supports the claim, and refuse to cite if the citation doesn't support the claim."

3. Review. This lets you then fire off a subagent to review the synthesis. It can have its own prompt: look for confirming and disconfirming evidence, don't trust uncited claims. If you find it making conflation mistakes, figure out at what stage and why, and adjust your process to get in front of them.

4. Manage your context. LLM only has a fixed context size ("chat length") and facts & instructions at the front of that tend to be better hewn to than things at the end. Subagents are a way of managing that context to get more from a single run. Artifacts like notebooks or records of subagent output move content outside the context so you can pick up in a new session ("chat") and continue the work.

It's less fun that just having a chat with ChatGPT. I find that I get much better quality results using these techniques. Hope this helps! If you're not interested in doing this (too much like work, and you already have something that works), it's no skin off my nose. All the best!

lkm0 · 2026-03-09T07:54:41 1773042881

Thanks for the thoughtful reply! I definitely want to try a more complex setup when I have more time on my hands

gnat · 2026-02-12T23:40:42 1770939642

We're in the brief window of time when AI's writing style is the weirdness. It's an artifact of the production process, like JPG blur, MP3 distortion, autotune's rigidity. And it didn't take long for those things to become normalized, in fact for them to become artifacts that people proudly adopted and embraced. DJs release tracks built from MP3s samples instead of waves. Autotune is famously a 'sound' that was once something to be subtly added and never confessed to, but which now genres and artists lean into rather than away from.

Long story short: I think emoji in headings and lists, em dashes, and the vile TED Talk paragraph structure of "long sentence with lots of words asking a question or introducing a possibility. followed by. short sentences. rebutting. or affirming." are here to stay. My money is that it gets normalized and embraced as "well of course that's how you best communicate because I see it everywhere."

calvinmorrison · 2026-02-12T23:46:12 1770939972

Short sentences were popularized in writing only in the last hundred and fifty years. Styles change.

the_af · 2026-02-13T01:23:13 1770945793

Yes, but it's kinda sad, isn't it, that this robotic way of writing in turn teaches a new generation of people how to write?

Also, you forgot the extremely enervating: "It's not X. It's Y. <Clincher>."

AlecSchueler · 2026-02-13T10:04:06 1770977046

> "well of course that's how you best communicate because I see it everywhere."

These assumptions might also change though. Up until now any writing you saw "everywhere" was probably written by someone who studied and loved written communication and was brining their artisanal care to the table. That's no longer the case.

It's called slop for a reason. When I come across a GitHub README written by AI I don't feel put off just because the author used AI to write it, I feel frustrated because it's genuinely poorly communicating with me. Fill of extraneous details, artifacts from the conversation, and stuff I already know ("uses GitHub to share the source democratically!").

gnat · 2026-01-24T14:07:02 1769263622

https://archive.is/2026.01.24-103304/https://www.nytimes.com...

gnat · 2025-11-29T13:43:47 1764423827

(Hi, Tom!) Reread the article and look for “CPU”. The whole article is about doing deep learning on CPUs not GPUs. Moonshine, the open source project and startup he talks about, shows speech recognition and realtime translation on the device rather than on a server. My understanding is that doing The Math in parallel is itself a performance hack, but Doing Less Math is also a performance hack.

gnat · 2025-11-22T18:38:10 1763836690

What have you done to make Claude stronger on brownfields work? This is very interesting to me.

gnat · 2025-11-12T22:22:14 1762986134

I hate its acknowledgement of its personality prompt. Try having a series of back and forth and each response is like “got it, keeping it short and professional. Yes, there are only seven deadly sins.” You get more prompt performance than answer.

sheepscreek · 2025-11-13T00:24:45 1762993485

I like the term prompt performance; I am definitely going to use it:

> prompt performance (n.)

> the behaviour of a language model in which it conspicuously showcases or exaggerates how well it is following a given instruction or persona, drawing attention to its own effort rather than simply producing the requested output.

:)

jjcob · 2025-11-13T06:45:56 1763016356

Might be a result of using LLMs to evaluate the output of other LLMs.

LLMs probably get higher scores if they explicitly state that they are following instructions...

resfirestar · 2025-11-13T20:22:52 1763065372

It's like writing an essay for a standardized test, as opposed to one for a college course or for a general audience. When taking a test, you only care about the evaluation of a single grader hurrying to get through a pile of essays, so you should usually attempt to structure your essay to match the format of the scoring rubric. Doing this on an essay for a general audience would make it boring, and doing it in your college course might annoy your professor. Hopefully instruction-following evaluations don't look too much like test grading, but this kind of behavior would make some sense if they do.

siva7 · 2025-11-13T07:50:25 1763020225

That's the equivalent of a performative male, so better call it performative model behaviour.

cma · 2025-11-13T06:29:49 1763015389

Pay people $1 and hour and ask them to choose A or B, which is more short and professional:

A) Keeping it short and professional. Yes, there are only seven deadly sins

B) Yes, there are only seven deadly sins

Also have all the workers know they are being evaluated against each other and if they diverge from the majority choice their reliability score may go down and they may get fired. You end up with some evaluations answered as a Keynesian beauty contest/family feud survey says style guess instead of their true evaluation.

totallymike · 2025-11-13T06:39:57 1763015997

I can’t tell if you’re being satirical or not…

cma · 2025-11-13T07:09:17 1763017757

https://time.com/6247678

totallymike · 2025-11-13T13:11:05 1763039465

jfc thank you for the context

jdelman · 2025-11-13T02:04:44 1762999484

This is even worse on voice mode. It's unusable for me now.

gnat · 2025-11-10T16:22:29 1762791749

Tl;Dr: ThoughtWorks founder is spending his millions portraying Chinese government policies, including Xinjian/Uighurs, in a positive light. His spending his heavily laundered but he’s now based in China, and working in the same offices as a propaganda company.

gnat · 2025-09-04T08:15:45 1756973745

Calendar was brilliant. I think it was the first time I fully appreciated the misery of the human mind in the face of various orbit periods that aren't simple integer ratios of one another. https://www.bbc.co.uk/programmes/p00548m9

Great Fire of London too. Pepys burying his cheese! https://www.bbc.co.uk/programmes/b00ft63q

Politeness. Social barriers were coming down, you were interacting with people of different rank, how do you not get into a swordfight? Also, the letter from the wife complaining about her husband! https://www.bbc.co.uk/programmes/p004y29m

I think they did all the big interesting things in history and then struggled with a lot of minor events that were hard to find interesting angles on.

gnat · 2025-08-16T19:04:12 1755371052

Thank you! Worth reading, if only for the phrase “global taint ruler”.