> tools that reliably turn slapdash prose into median-grade idiomatic working code
This may be the crux of it.
Turning slapdash prose into median-grade code is not a problem I can imagine needing to solve.
I think I'm better at describing code in code than I am in prose.
I Want to Believe. And I certainly don't want to be "that guy", but my honest assessment of LLMs for coding so far is that they are a frustrating Junior, who maybe I should help out because mentoring might be part of my job, but from whom I should not expect any near-term technical contribution.
Sorry, are you saying "the only place where there's slapdash prose is right before it would be super cool to have an alpha version of the code magically appear, that we can iterate on based on the full context of the team, company, and industry"?
Alpha code with zero context is an utter waste of attention.
I must be confused about how y'all are developing software, because the path from "incompletely specified takeaways from a product design meeting", and "final product" does not pass through any intermediate steps where reduced contextual awareness is valuable.
The median-quality code just doesn't seem like a valuable asset en route to final product, but I guess it's a matter of process at that point.
Generative AI, as I've managed to use it, brings me to a place in the software lifecycle that I don't want to be. Median-quality code that lacks the context or polish needed to be usable. Or in some cases even parseable.
I may be missing essential details though. Smart people are getting more out of AI than I am. I'd love to see a Youtube/Twitch/etc video of someone who knows what they're doing demoing the build of a typical TODO app or similar, from paragraphs to product.
Median-quality code is extraordinarily valuable. It is most of the load-bearing code people actually ship. What's almost certainly happening here is that you and I have differing definitions of "median-quality" commercial code.
I'm pretty sure that if we triangle-tested (say) a Go project from 'jerf and Gemini 2.5 Go output for the same (substantial; say, 2,000 lines) project --- not whatever Gemini's initial spew is, but a final product where Gemini is the author of 80+% of the lines --- you would not be able to pick the human code out from the LLM code.
A couple prompt/edit "cycles" into a Cursor project, Gemini's initial output gives me better-than-junior code, but still not code I would merge. But you review that code, spot the things you don't like (missed idioms, too much repetition, weird organization) and call them out; Gemini goes and fixes them. The result of that process is code that I would merge (or that would pass a code review).
What I feel like I keep seeing is people who see that initial LLM code "proposal", don't accept it (reasonably!), and end the process right there. But that's not how coding with an LLM works.
I've gone many cycles deep, some of which have resulted in incremental improvements.
Probably one of my mistakes is testing it with toy challenges, like bad interview questions, instead of workaday stuff that we would normally do in a state of half-sleep.
The latter would require loading the entire project into context, and the value would be low.
My thought with the former is that it should be able to produce working versions of industry standard algorithms (bubble sort, quicksort, n digits of pi, Luhn, crc32 checksum, timezone and offset math, etc) without requiring any outside context (proprietary code) -- and perhaps erroneously, that if it fails to pull off such parlor tricks, and creates such glaring errors in the process, that it couldn't add value elsewhere either.
Why are you hesitating to load all the context you need (Cursor will start from a couple starting-point files you explicitly add the context window and then go track other stuff down). It's a machine. You don't have to be nice to it.
Just the usual "is this service within our trust perimeter" hesitation, when it comes to sharing source code.
I expected to get better results from my potted tests, and to assemble a justification for expanding the perimeter of trust. This hasn't happened yet, but I definitely see your point.
Presumably it would also be possible to hijack Cursor's network desires and redirect to a local LLM that speaks the same protocol.
This may be the crux of it.
Turning slapdash prose into median-grade code is not a problem I can imagine needing to solve.
I think I'm better at describing code in code than I am in prose.
I Want to Believe. And I certainly don't want to be "that guy", but my honest assessment of LLMs for coding so far is that they are a frustrating Junior, who maybe I should help out because mentoring might be part of my job, but from whom I should not expect any near-term technical contribution.