> *tools that reliably turn slapdash prose into median-grade idiomatic working c...

tptacek · 2025-04-27T22:51:55 1745794315

It is most of the problem of delivering professional software.

quesera · 2025-04-27T23:19:50 1745795990

Not in my experience.

The only slapdash prose in the cycle is in the immediate output of a product development discussion.

And that is inevitably too sparse to inform, without the full context of the team, company, and industry.

akerl_ · 2025-04-28T00:23:23 1745799803

Sorry, are you saying "the only place where there's slapdash prose is right before it would be super cool to have an alpha version of the code magically appear, that we can iterate on based on the full context of the team, company, and industry"?

quesera · 2025-04-28T15:23:47 1745853827

No, not at all.

Alpha code with zero context is an utter waste of attention.

I must be confused about how y'all are developing software, because the path from "incompletely specified takeaways from a product design meeting", and "final product" does not pass through any intermediate steps where reduced contextual awareness is valuable.

Writing code is not the hard part.

tptacek · 2025-04-28T18:22:59 1745864579

Where's "zero context" coming from here?

tptacek · 2025-04-28T01:10:14 1745802614

I didn't say anything about "slapdash".

quesera · 2025-04-28T15:18:17 1745853497

Umm. Yeah, I think ya did. :)

tptacek · 2025-04-28T16:09:43 1745856583

quesera · 2025-04-28T16:32:09 1745857929

You introduced the word into the thread. I quoted you.

Unless you're operating at some notational level above the literal, yes I think you did.

tptacek · 2025-04-28T17:02:46 1745859766

Sorry, I was referring the the prompt, not the code.

quesera · 2025-04-28T17:46:38 1745862398

I was referring to the prompt/prose as well.

The median-quality code just doesn't seem like a valuable asset en route to final product, but I guess it's a matter of process at that point.

Generative AI, as I've managed to use it, brings me to a place in the software lifecycle that I don't want to be. Median-quality code that lacks the context or polish needed to be usable. Or in some cases even parseable.

I may be missing essential details though. Smart people are getting more out of AI than I am. I'd love to see a Youtube/Twitch/etc video of someone who knows what they're doing demoing the build of a typical TODO app or similar, from paragraphs to product.

tptacek · 2025-04-28T17:56:33 1745862993

Median-quality code is extraordinarily valuable. It is most of the load-bearing code people actually ship. What's almost certainly happening here is that you and I have differing definitions of "median-quality" commercial code.

I'm pretty sure that if we triangle-tested (say) a Go project from 'jerf and Gemini 2.5 Go output for the same (substantial; say, 2,000 lines) project --- not whatever Gemini's initial spew is, but a final product where Gemini is the author of 80+% of the lines --- you would not be able to pick the human code out from the LLM code.

quesera · 2025-04-28T18:15:38 1745864138

This is probably true. I'm using your "median-quality" label, but that would be a generous description of the code I'm getting from LLMs.

I'm getting median-quality junior code. If you're getting median-quality commercial code, then you are speaking better LLMish than I.

tptacek · 2025-04-28T18:22:13 1745864533

A couple prompt/edit "cycles" into a Cursor project, Gemini's initial output gives me better-than-junior code, but still not code I would merge. But you review that code, spot the things you don't like (missed idioms, too much repetition, weird organization) and call them out; Gemini goes and fixes them. The result of that process is code that I would merge (or that would pass a code review).

What I feel like I keep seeing is people who see that initial LLM code "proposal", don't accept it (reasonably!), and end the process right there. But that's not how coding with an LLM works.

quesera · 2025-04-29T13:44:10 1745934250

I've gone many cycles deep, some of which have resulted in incremental improvements.

Probably one of my mistakes is testing it with toy challenges, like bad interview questions, instead of workaday stuff that we would normally do in a state of half-sleep.

The latter would require loading the entire project into context, and the value would be low.

My thought with the former is that it should be able to produce working versions of industry standard algorithms (bubble sort, quicksort, n digits of pi, Luhn, crc32 checksum, timezone and offset math, etc) without requiring any outside context (proprietary code) -- and perhaps erroneously, that if it fails to pull off such parlor tricks, and creates such glaring errors in the process, that it couldn't add value elsewhere either.

tptacek · 2025-04-30T19:06:24 1746039984

Why are you hesitating to load all the context you need (Cursor will start from a couple starting-point files you explicitly add the context window and then go track other stuff down). It's a machine. You don't have to be nice to it.

quesera · 2025-04-30T21:34:21 1746048861

Just the usual "is this service within our trust perimeter" hesitation, when it comes to sharing source code.

I expected to get better results from my potted tests, and to assemble a justification for expanding the perimeter of trust. This hasn't happened yet, but I definitely see your point.

Presumably it would also be possible to hijack Cursor's network desires and redirect to a local LLM that speaks the same protocol.