Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Tried an English to Greek translation with the smaller one. Results were hideous. Mistral small is leaps and bounds better. Also I don't get why the 4-bit quantization by default. In my experience anything below 8-bit and the model fails to understand long prompts. They gutted their own models.


They used quantization-aware training, so the quality loss should be negligible. Doing anything with this model's weights would be a different story, though.

The model is clearly heavily finetuned towards coding and math, and is borderline unusable for creative writing and translation in particular. It's not general-purpose, excessively filtered (refusal training and dataset lobotomy is probably a major factor behind lower than expected performance), and shouldn't be compared with Qwen or o3 at all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: