1491 vs 1418 ELO means the stronger model wins about 60% of the time.

supermatt · 2025-12-02T16:50:56 1764694256

Probably naive questions:

Does that also mean that Gemini-3 (the top ranked model) loses to mistral 3 40% of the time?

Does that make Gemini 1.5x better, or mistral 2/3rd as good as Gemini, or can we not quantify the difference like that?

esafak · 2025-12-02T16:54:10 1764694450

Yes, of course.

uejfiweun · 2025-12-03T01:28:18 1764725298

Wow. If all the trillions only produces that small of a diff... that's shocking. That's the sort of knowledge that could pop the bubble.

JustFinishedBSG · 2025-12-03T10:10:25 1764756625

I wouldn't trust LMArena results much. They measure user preference and users are highly skewed by style, tone etc.

You can litteraly "improve" your model on LMArena by just adding a bunch of emojis.