There are a lot of ways that AI speeds up software development processes that aren't the actual software development.
I am finding that lately I do not allow LLMs to write any code I am interested in maintaining. Or if they do, I have to micromanage them and it usually takes longer. They produce mediocre solutions, and often add redundant state ("Why did you add that state?" "Because we might need it in the future")
That said, they are extremely good at:
- Dev tools: creating debug tooling, debug screens, scripts that get the job done
- Auxiliary development: landing pages, "what's new" screens, tedious boilerplate, gathering strings for localization
- Prototyping: building full implementations quickly so you can see all the problems rather than having to anticipate them
- Pure transformation: porting from one language or paradigm to another
So while I agree with the article that the actual spec of the feature you are building needs just as much human thought, regardless of AI, the speed-ups around that are worth exploring
An example I have from a recent feature development is adding CarPlay support to an existing app. We could have talked about it and designed it for weeks, but with an LLM I was able to get it running in my car in an hour, go for a drive, and feel it to understand whether it was a valuable direction.
The code was a mess, most of it had to be thrown away, and the LLM couldn't even get the initial build functional (not much CarPlay training data, I expect). But it was an accelerator to answer the question "is it worth investing more time in this?"
To repurpose a quote from Walt Disney, I don’t make software to make money, I make money to make more software.
I want my hobby project to be my job, because I don’t want to work for someone else. I want creative control, freedom to explore and ship ideas, and financial stability.
The only way to get there, that I can see, is to charge for my work.
I always wondered how this compares to the 1999 algorithm Texture Synthesis by Non-parametric Sampling [1]. The results look very similar to my eyes. Implementation here [2] — has anyone tried both?
Woah. As an adult man with five kids, two of them infants, the most natural thing in the world is for them to be present in almost every second of my life.
It’s not difficult at all. Minutes after birth, naked baby was on my naked chest, and bonding started. This never felt contrary to my instinct.
Ok, you may consider it easy after 5, and kudos to you, but kids are definitely not “not difficult.”
I agree that it’s the most natural thing, and I consider most of my time spent elsewhere to be a waste, but our youngest is very active and worrying about her wellbeing for extended periods is definitely exhausting!
We had a contention between MCP / Skills for our product and ended up offering both. We built a CLI tool that could interface with the MCP server [1]. It seems redundant but our app is a coding app on iOS (Codea), and the issue with offering a plain MCP server meant that the agentic coding harness found it harder to do its job.
With the CLI the agent could check out the project, work on it locally with its standard file editing / patching / reading tools, then push the work back to device. Run and debug on device, edit locally, push.
With MCP the agent had to query the MCP server for every read and write and was no longer operating in its normal coding loop. It still works, though, and as a user you can choose to bypass the CLI and connect directly via MCP.
The MCP server was valuable as it gave us a consistent and deterministic language to speak. The CLI tool + Skill was valuable for agentic coding because it allowed the coding work to happen with the standard editing tools used by agents.
The CLI also gave us device discovery. So the agent can simply discover nearby devices running Codea and get to work, instead of a user having to add a specific device via its IP address to their agent.
I tried to live like this for a while but found I could not separate applications into spaces
I would try setting up a space for, eg, all my communication stuff. But suddenly I’d need to drag-and-drop an image from my image editor into Slack. Or I’d want to drag a graphic from Safari into Final Cut Pro. Or any number of cross-workspace operations
How do you handle this with spaces? Do you initiate the drag, tap the space hot key, then drop?
I had Opus 4.6 running on a backend bug for hours. It got nowhere. Turned out the problem was in AWS X-ray swizzling the fetch method and not handling the same argument types as the original, which led to cryptic errors.
I had Opus 4.6 tell me I was "seeing things wrong" when I tried to have it correct some graphical issues. It got stuck in a loop of re-introducing the same bug every hour or so in an attempt to fix the issue.
I'm not disagreeing with your experience, but in my experience it is largely the same as what I had with Opus 4.5 / Codex / etc.
Haha, reminds me of an unbelievably aggravating exchange with Codex (GPT 5.4 / High) where it was unflinchingly gaslighting me about undesired behavior still occurring after a change it made that it was adamant simply could not be happening.
It started by insisting I was repeatedly making a typo and still would not budge even after I started copy/pasting the full terminal history of what I was entering and the unabridged output, and eventually pivoted to darkly insinuating I was tampering with my shell environment as if I was trying to mislead it or something.
Ultimately it turned out that it forgot it was supposed to be applying the fixes to the actual server instead of the local dev environment, and had earlier in the conversation switched from editing directly over SSH to pushing/pulling the local repo to the remote due to diffs getting mangled.
The example given in the article is acceptance criteria for a login/password entry flow. This is fairly easy to spec-out in terms of AC and TDD.
I have been asking these tools to build other types of projects where it (seems?) much more difficult to verify without a human-in-the-loop. One example is I had asked Codex to build a simulation of the solar system using a Metal renderer. It produced a fun working app quickly.
I asked it to add bloom. It looped for hours, failing. I would have to manually verify — because even from images — it couldn't tell what was right and wrong. It only got it right when I pasted a how-to-write-a-bloom-shader-pass-in-Metal blog post into it.
Then I noticed that all of the planet textures were rotating oddly every time I orbited the camera. Codex got stuck in another endless loop of "Oh, the lookAt matrix is in column major, let me fix that <proceeds to break everything>." or focusing (incorrectly) on UV coordinates and shader code. Eventually Codex told me what I was seeing "was expected" and that I just "felt like it was wrong."
When I finally realised the problem was that Codex had drawn the planets with back-facing polygons only, I reported the error, to which Codex replied, "Good hypothesis, but no"
I insisted that it change the culling configuration and then it worked fine.
These tools are fun, and great time savers (at times), but take them out of their comfort zone and it becomes real hard to steer them without domain knowledge and close human review.
That's a pretty extreme take. I've been using the Mac since about 2001. I like Tahoe and a well designed Tahoe app can look really nice on the platform. There are bugs, inconsistencies and other issues, but it doesn't feel that different than many previous macOS / OS X releases
I believe you can do regular hard edged intersections. You can see in his operator list some are listed as “smoothSubtract” and some are just “subtract”
It’s just easy to do the melding thing with SDFs so a lot of people do it
I am finding that lately I do not allow LLMs to write any code I am interested in maintaining. Or if they do, I have to micromanage them and it usually takes longer. They produce mediocre solutions, and often add redundant state ("Why did you add that state?" "Because we might need it in the future")
That said, they are extremely good at:
- Dev tools: creating debug tooling, debug screens, scripts that get the job done - Auxiliary development: landing pages, "what's new" screens, tedious boilerplate, gathering strings for localization - Prototyping: building full implementations quickly so you can see all the problems rather than having to anticipate them - Pure transformation: porting from one language or paradigm to another
So while I agree with the article that the actual spec of the feature you are building needs just as much human thought, regardless of AI, the speed-ups around that are worth exploring
An example I have from a recent feature development is adding CarPlay support to an existing app. We could have talked about it and designed it for weeks, but with an LLM I was able to get it running in my car in an hour, go for a drive, and feel it to understand whether it was a valuable direction.
The code was a mess, most of it had to be thrown away, and the LLM couldn't even get the initial build functional (not much CarPlay training data, I expect). But it was an accelerator to answer the question "is it worth investing more time in this?"