At this point, any article that makes claims about "LLMs" rather than specific m...

evanwise · on May 13, 2023

Did you read the article or just the title? They mention the specific models the researchers were testing and note that increasing model size did not seem to offer much improvement on this metric. It also ends with a discussion of research into methods for improving performance on queries involving negation.

ilaksh · on May 13, 2023

I read the article. The main point of the article, which is in the title, is directed at LLMs in general. They drew the wrong conclusion and made the title and main point too general.

It actually would have seemed like a valid conclusion (although still too general) if the article came out some months ago. But GPT-4 and the very latest model versions from other companies show they were over-generalizing.

Also the model size isn't necessarily the determining factor.

tgv · on May 13, 2023

> All of these will perform differently on the negation issue.

That's a bit of a cop-out, but the classical and logical (in the mathematical sense) way to do it, is to have a feature that represents negation. Embeddings don't work like this, so it's quite possible that the presence of negation gets pushed aside in the processing of the answer, simply because other features matter more.

It's also to be expected that such models will have problems with similar abstract concepts that have more effect on the interpretation than their "physical" presence would suggest, such as nested existential qualifiers, and consequently logical proof. By enlarging the model, you can fake it a bit.