Watching posts shift in real time is very entertaining.
First it's not generally intelligent because it can't tackle new things then when it obviously does its not generally intelligent because it's overfit.
You've managed to essentially say nothing of substance. So it passes because structure and concepts are similar. okay. are students preparing for tests working with alien concepts and structures then because i'm failing to see the big difference here.
A model isn't overfit because you've declared it so. and unless GPT-4 is several trillion parameters, general overfitting is severely unlikely. But i doubt you care about any of that.
Can you devise a test to properly asses what you're asserting ?
I have no idea what is shifting in real time. I formed this opinion of GPT4 by running it through several benchmarks and making adjustments to them, so my view is empirical and it was formed 1 week after it came out.
Your post says nothing of substance because it offers no substantial rebuttal and seems to just attack a position by creating a hand-waved argument without any clear understanding of how parameters in-fact impact a model's outputs.
You seem to have a serious attitude problem in your responses so this is my last one.
It's propietary company evaluation data, and it's for a specific domain related to software development, a domain that OpenAI is actively attempting to improve performance for.
Anyways enjoy your evening. If you want to actually have a reasonable discussion without being unpleasant I'd be happy to discuss further.
How does it empirically prove general overfitness ?
People study from books or from teachers or other sources of knowledge and internalize it and relate it to other concepts as well, and no one considers that to be a form of overfitting.
You basically said what amounts to "it overfits to concepts" which is honestly quite ridiculous. Not only is it a standard humans would fail, that's not what overfit is generally taken to mean.
I agree with the parent post. I can get ChatGPT to solve a basic world problem but if I add a small wrinkle to it that a human would understand it fails hard. Overfitted seems apt.
Stop confusing ChatGPT with GPT-4.
Most common rookie mistake. GPT-4 is way stronger at 'solving problems' than ChatGPT. I was baiting ChatGPT with basic logical or conversion problems, I stopped doing that with GPT-4, since it would take too much effort to beat it.
Dealing with words on the level of their constituent letters is a known weakness of OpenAI’s current GPT models, due to the kind of input and output encoding they use. The encoding also makes working with numbers represented as strings of digits less straightforward than it might otherwise be.
In the same way that GPT-4 is better at these things than GPT-3.5, future GPT models will likely be even better, even if only by the sheer brute force of their larger neural networks, more compute, and additional training data.
(To see an example of the encoding, you can enter some text at https://platform.openai.com/tokenizer. The input is presented to GPT as a series of integers, one for each colored block.)
Second, You're going to have to give specific examples on what a small wrinkle is. I've seen "can't solve variation of common word problem" but that's a failure mode of people too. and if you reword the question so it doesn't bias common priors or even telling it it's making an assumption wrong, it often gets it right.
> Watching posts shift in real time is very entertaining. First it's not generally intelligent because it can't tackle new things then when it obviously does its not generally intelligent because it's overfit.
This wasn't new in the same way that making any test about Romeo and Juliet isn't new. You're still going to the same sources for the answer. It's the exact same goalpost.
You've managed to essentially say nothing of substance. So it passes because structure and concepts are similar. okay. are students preparing for tests working with alien concepts and structures then because i'm failing to see the big difference here.
A model isn't overfit because you've declared it so. and unless GPT-4 is several trillion parameters, general overfitting is severely unlikely. But i doubt you care about any of that. Can you devise a test to properly asses what you're asserting ?