I'm working on a CRM with a flexible data model, and ChatGPT has written most of the code. I don't use the IDE integrations because I find them too "low level" - I work with GPT more in a sort of "pair programming" session: I give it high level, focused tasks with bits of low level detail if necessary; I paste code back and forth; and I let it develop new features or do refactorings.
This workflow is not perfect but I am definitely building out all the core features way faster than if I wrote the code myself, and the code is in quite a good state. Quite often I do some bits of cleanup, refactorings, making sure typings are complete myself, then update ChatGPT with what the code now looks like.
I think what people miss is there are dozens of different ways to apply AI to your day-to-day as a software engineer. It also helps with thinking things through, architecture, describing best practices.
I share your sentiment, I've written three apps where I've used language models extensively (a different one for each: ChatGPT, Mixtral and Llama-70B) and while I agree that they where immensely helpful in terms of velocity, there are a bunch of caveats:
- it only works well when you write code from scratch, context length is too short to be really helpful for working on existing codebase.
- the output code is pretty much always broken in some way, and you need to be accustomed to doing code reviews to use them effectively. If you trust the output and had to debug it later it would be a painfully slow process.
Also, I didn't really noticed a significant difference in code quality, even the best model (GPT-4) write code that doesn't work, and I find it much more efficient to use open models on Groq due to the really fast inference. Looking at ChatGPT slowly typing is really annoying (I didn't test o1 and I have no interest in doing so because of its very low throughput).
> context length is too short to be really helpful for working on existing codebase.
This is kind of true, my approach is I spend a fairly large amount of time copy-pasting code from relevant modules back and forth into ChatGPT so it has enough context to make the correct changes. Most changes I need to make don't need more than 2-3 modules though.
> the output code is pretty much always broken in some way, and you need to be accustomed to doing code reviews to use them effectively.
I think this really depends on what you're building. Making a CRM is a very well trodden path so I think that helps? But even when it came to asking ChatGPT to design and implement a flexible data model it did a very good job. Most of the code it's written has worked well. I'd say maybe 60-70% of the code it writes I don't have to touch at all.
The slow typing is definitely a hindrance! Sometimes when it's a big change I lose focus and alt-tab away, like I used to do when building large C++ codebases or waiting for big test suites to run. So that aspect saps productivity. Conversely though I don't want to use a faster model that might give me inferior results.
> approach is I spend a fairly large amount of time copy-pasting code from relevant modules back and forth into ChatGPT
It can work, but what a terrible developer experience.
> I'd say maybe 60-70% of the code it writes I don't have to touch at all
I used to to write web apps so the ratio was even higher I'd say (maybe 80/90% of the code didn't need any modification) but the app itself wouldn't work at all if I didn't make those 10% changes. And you really need to read 100% of the code because you won't know upfront where those 10% will be.
> The slow typing is definitely a hindrance! Sometimes when it's a big change I lose focus and alt-tab away, like I used to do when building large C++ codebases or waiting for big test suites to run.
Yeah exactly, it's xkcd 303 but with “IA processing the response” instead of “compiling”. Having instant response was a game changer for me in terms of focus hence productivity.
> I don't want to use a faster model that might give me inferior results
As I said earlier, I didn't really feel the difference in quality so the switch was without drawbacks.
> Also, I didn't really noticed a significant difference in code quality, even the best model (GPT-4) write code that doesn't work,
Interesting, personally I have noticed a difference. Mostly in how well the models pick up small details and context. Although I do have to agree that the open Llama models are generally fairly serviceable.
Recently I have tended to lean towards Claude Sonnet 3.5 as it seems slightly better. Although that does differ per language as well.
As far as them being slow, I haven't really noticed a difference. I use them mostly through the API with open webui and the answers come quick enough.
I use o1 for research rather than coding. If I have a complex question that requires combining multiple ideas or references and checking the result, it's usually pretty good at that.
Sometimes that results in code, but it's the research and cross referencing that's actually useful with it
Its interesting to see these LLM tools turning developers into no-code customers. Where tools like visual site builders allowed those without coding experience to code a webpage, LLMs are letting those with coding experience to avoid the step of coding.
There's not even anything wrong with that, don't take my comment the wrong way. It is an interesting question of what happens at scale though. We could easily find ourselves in a spot where very few people know how to code and most producing code don't actually know how it works and couldn't find or fix a bug if they needed to. It also means LLMs would be stuck with today's code for a training set until it can invent its own coding paradigms and languages, at which point we're all left in the dust trusting it to work right.
There is this tool Aider. Takes your prompt, adds code files (sometimes not all of your code files but files it figures relevant) and prepares one long prompt, sends it to an LLM, receives the response, and makes a git commit based on the response. If you rather review git commits, it can save you the back-and-forth copy-pasting. https://aider.chat/
Note that the default mode will automatically change and commit the code, which I found counter-intuitive. I prefer using the architect mode, where it first tells you what it is going to do, so you can iterate on it before making changes.
This workflow is not perfect but I am definitely building out all the core features way faster than if I wrote the code myself, and the code is in quite a good state. Quite often I do some bits of cleanup, refactorings, making sure typings are complete myself, then update ChatGPT with what the code now looks like.
I think what people miss is there are dozens of different ways to apply AI to your day-to-day as a software engineer. It also helps with thinking things through, architecture, describing best practices.