Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

At this point, any article that makes claims about "LLMs" rather than specific model versions lacks credibility.

The current version of GPT-4 is very different from most existing LLMs (including the previous version of ChatGPT). The next version release will also be different.

Google just released PaLM 2 publicly. It is significantly better than what people saw with the initial Bard versions. They have a code-generation model that is not released publicly yet.

The open source models are also gaining capabilities and get new releases routinely.

Claude now has a new release with 100K tokens.

All of these will perform differently on the negation issue.



Did you read the article or just the title? They mention the specific models the researchers were testing and note that increasing model size did not seem to offer much improvement on this metric. It also ends with a discussion of research into methods for improving performance on queries involving negation.


I read the article. The main point of the article, which is in the title, is directed at LLMs in general. They drew the wrong conclusion and made the title and main point too general.

It actually would have seemed like a valid conclusion (although still too general) if the article came out some months ago. But GPT-4 and the very latest model versions from other companies show they were over-generalizing.

Also the model size isn't necessarily the determining factor.


> All of these will perform differently on the negation issue.

That's a bit of a cop-out, but the classical and logical (in the mathematical sense) way to do it, is to have a feature that represents negation. Embeddings don't work like this, so it's quite possible that the presence of negation gets pushed aside in the processing of the answer, simply because other features matter more.

It's also to be expected that such models will have problems with similar abstract concepts that have more effect on the interpretation than their "physical" presence would suggest, such as nested existential qualifiers, and consequently logical proof. By enlarging the model, you can fake it a bit.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: