ChatGPT is gonna really fuck up SO. I used it just now to figure out some rarely-used Git feature, and got an answer quicker than SO or DuckDuckGo.
With the questions no longer being public, the search engines will become outdated.
Maybe I should be exporting my ChatGPT chats and contributing them to something equivalent to Common Crawl? I guess I can do that with a machine-readable blog "Everything I learned from asking ChatGPT this year"
For now, yes, because the question you asked was likely answered by someone on Stack Overflow (or Reddit, or Github, or wherever else) already and then made its way to the LLM's training data set. What happens when a brand new language or library or tool is released though and you run into a unique problem for the first time? When all human forums have been shut down, and AI still isn't intelligent enough to figure out the answer on its own?
Counterarguement: This will be solved mostly by documentation.
Historically, most of my SO usage boils down to:
1) finding how to implement something esoteric that results in finding a clever solution or a under described feature flag in a function/tool
2) finding a workaround bugfix for a broken feature in some software (>70% of the time finding link to a github issue in the description
If we consider that LLMs are functionally an information retrieval function containing natural language program subroutines. In this context, a web-browser enabled LLM should be able to determine go to source and return a functional answer on a model that is not pretrained on the source.
So as long as there is good documentation on a particular piece of software, we should theoretically be able to generalize to non-existing tools. At least long enough for there to be a newly-created training dataset from people hitting the problem for the first time.
Side note:
In some sense, the foundation model labs are aggregating the Question-Answer pairs (typically from stackoverflow) from their user data. I wouldn't be surprised if they created a stackoverflow clone at some point to opensource the dataset creation and labeling efforts.
This is basically what community notes is for X and now Facebook
Counter-counterargument: "So as long as there is good documentation" feels a bit like relying for success on the least important deliverable to people funding a project, and least interesting process step to people building it, going really well.
I think there's a difference between Stack Overflow and/or Reddit vs. specific community led forums, or even GitHub, where questions also get answered.
Just considering Stack Overflow for a moment, they exist to profit from their product of consolidating questions/answers. When the LLM can answer most questions more efficiently, they've lost much of their value proposition in terms of product... and perhaps their business along with it.
Many of the community forums, however, tend to not be businesses per se. Sure they'll see less traffic, but that might not matter to them. In fact, it might even be better to an extent because they often aren't monetizing their services and so LLMs carrying some weight can help reduce costs. Under those circumstances, LLMs may not be nearly so bad and they, themselves, will still have sources of training data.
For example, I read the Elixir Forums for language announcements, feature discussions, occasionally to ask questions that I can't resolve with research, and even to answer some questions. I've also got LLMs fairly well integrated into my workflow. I don't see that pattern changing: neither less Elixir Forum or less reliance on the LLM. What has changed is I don't use search as much as I use to, nor do I use Stack Overflow as much.
So I do expect the big aggregators to go away, those not tied to monetizing their knowledge transfer I expect to see less overall traffic, but not less meaningful and substantive interactions.
This is great for something like Git and terrible for something like how to make the borrow checker happy in Rust. SO was the go to platform for questions that require human ingeunity.
It's like the shift from public mailing lists to discord, it's all unindexed. Sometimes when I'm contemplating a new library or thing-bob I check out a bunch it the SO questions tagged for it. For something decently popular it can give you good view of how others are using it and where they're stubbing their toes and having to ask for help. Skimming github issues doesn't give you quite the same signals.
Agreed, wish there was some place to aggregate what people consider "good" conversations they had that doesn't just suck up the data for themselves and lock it away.
Assuming that people only share conversations they think are good, would that be bad? Isn’t that the basis of RHLF?
There are a few times on Reddit that I want to explain something that I know well. But it will be a long post.
I’ll be lazy and ask ChatGPT the question, either verify it’s correct based on what I know, ask it to verify its answer on the web - the paid version has had web search for over year - or guide it to the correct answer if I notice something is incorrect.
Then I’ll share the conversation as the answer and tell the poster to read through the entire conversation and tell them that I didn’t just naively ask ChatGPT. It will be obvious from my chat session.
I’ve had pretty good luck when having it write Python automation scripts around AWS using Boto3.
If it’s a newer API that ChatGPT isn’t trained on, I would either tell it where to find the newest documentation for the API on the web or paste the documentation in.
It usually worked pretty well.
If the author of the library wrote good documentation and sample code, you wouldn’t need StackOverflow hypothetically if ChatGPT was trained on it
Apple is training its own autocomplete for Swift on its documentation and its own sample code.
We don't have to guess. Just look at languages which have been around for a while, achieved some baseline level of popularity to have a decent amount of public code available, like Elixir.
I haven't found an LLM that could reliably produce syntactically correct code, much less logically correct code.
Since LLMs have been a thing, I’ve been heavily involved in the AWS ecosystem and automation.
ChatGPT is well trained on the AWS SDK for various languages. I can usually ask it to do something that would be around up to a 100-200 line Python script and it gets it correct. Especially once it got web search capabilities, I could tell it to “verify Boto3 (the AWS SDK for Python) functions on the web”.
I’ve also used it to convert some of my handwritten AWs SDK based scripts between languages depending on the preferences of the client - C#, Python, JavaScript and Java.
It also does pretty well at converting CloudFormation to idiomatic CDK and Terraform.
I was going into one project to teach the client how to create deployment pipelines for Java based apps to Lambda, EC2 and ECS (AWS’s Docker orchestration service).
I didn’t want to use their code for the proof of concept/MVP. But I did want to deploy a sample Java API. I hadn’t touch Java in over 20 years. I was a C#/Python/Node/(barely) Go developer.
I used ChatGPT to create a sample CRUD API in Java that connected to a database. It worked perfectly. I also asked about proper directory structure.
It didn’t work perfectly with helping me build the Docker container. But it did help.
On another note: it’s not too much of leap to see how Visual Studio or ReSharper could integrate an LLM better and with static language, guarantee that the
code is at least syntactically correct and the functions that are call exist in the standard library or the solution.
They can already do quick, real time warnings and errors if your code won’t compile as you type.
Its own answers, with feedback about whether the answers seem to have worked.
Learning to predict what word will lead to a successful solution (rather than just looking like existing speech) may prove to be a richer dataset than SO originally was.
Most of the time this would happen in the form of an interactive debugging session, with immediate feedback.
Code review is its own domain. In general at some point LLMs need to be trained with a self-evaluation loop. Currently their training data contains a lot of "smart and knowledgeable human tries to explain things". And they average out to conversation that is "smart and knowledgeable...about everything". That won't get us to, "Recognizably thinks of things that no human would have." For that we need to get it producing content that is recognizably higher than human quality.
For that we should find ways to optimize existing models for an evaluation function that says, "Will do really well on self-review." Then it can learn to not just give answers that help with interactive debugging, but actually give answers that will also do well with more strenuous code review. Which it taught itself how to do in a similar way to how AlphaZero manages to teach itself game strategies.
With the questions no longer being public, the search engines will become outdated.
Maybe I should be exporting my ChatGPT chats and contributing them to something equivalent to Common Crawl? I guess I can do that with a machine-readable blog "Everything I learned from asking ChatGPT this year"