From the amount of data each successive generation used (which grew many orders ...

meiraleal · on Nov 11, 2024

If that's the case, why we aren't seeing yet specialized LLMs for say only JavaScript, or translating from english to portuguese, etc?

Tier3r · on Nov 11, 2024

We are likely going to get there. Similar to the steam/combustion engines (and other core technologies like computers, wireless transmission etc) there's first a massive rush to increase the power of it, at the cost of efficiency and effectiveness for more niche use cases. Then it is specialised to various use cases with large improvements in efficiency and effectiveness. My own prediction for where most gains will now come is

1) Creating new "harnesses" for models that connect to various systems, APIs, frameworks, etc. While this sounds "trivial", a lot of gains can come from this. Similar to how the voice version of ChatGPT was (apparently) amazing, all you really had to do was create an additional voice to text layer and another text to voice layer.

2) Increasing specialisation of models. I predict over time that end user AI companies (e.g those that just use models and not develop them), will use more and more specialised models. The current, almost monolithic, system where every service from text summary to homework help is plugged into the same model will slowly change.

kingkongjaffa · on Nov 11, 2024

We kind of have, that's what fine tuning is trying to achieve.

We haven't seen wholesale specialised models yet because creating foundation models is expensive and difficult and the current highest ROI is to make a general model.

cma · on Nov 11, 2024

> to the decreasing, logarithmic performance

In what measure, loss? Loss can't go below 0 plus the inherent entropy in the text (other than that with overfitting it could reach nearer to 0, but not fully if it is next token and there are multiple same prefixes).

With respect to hallucinations 4 got incredibly better over 3

Tier3r · on Nov 11, 2024

In intelligence/performance. It's admittedly a fuzzy notion. Most benchmarks will probably show decreasing gains between generations. Similar to time/space complexity, trying to debate about what performance/intelligence is will get into a million definitions, caveats and technicalities. But a relative comparison between inputs and outputs is gives us useful information.

The inputs - data, compute and parameters - going into training these models have grown by many orders of magnitude between each gen. There's a lot of fuzziness about how much better each gen has gotten, but clearly 4 is not many orders of magnitude better than 3 by any reasonable definition. This mental model isn't useful to say how good each gen is, but it is quite useful to see the trend and make long term predictions.