Tried an English to Greek translation with the smaller one. Results were hideous. Mistral small is leaps and bounds better. Also I don't get why the 4-bit quantization by default. In my experience anything below 8-bit and the model fails to understand long prompts. They gutted their own models.
They used quantization-aware training, so the quality loss should be negligible. Doing anything with this model's weights would be a different story, though.
The model is clearly heavily finetuned towards coding and math, and is borderline unusable for creative writing and translation in particular. It's not general-purpose, excessively filtered (refusal training and dataset lobotomy is probably a major factor behind lower than expected performance), and shouldn't be compared with Qwen or o3 at all.