Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It would be interesting to try, but for the Aider benchmark, the dense 32B model scores 50.2 and the 30B-A3B doesn't publish the Aider benchmark, so it may be poor.


Is that Qwen 2.5 or Qwen 3? I don't see a qwen 3 on the aider benchmark here yet: https://aider.chat/docs/leaderboards/


As a human who asks AI to edit upto 50 SLOC at a time, is there value in models which score less than 50%? Im using the `gemini-2.0-flash-001` though.


The aider score mentioned in GP was published by Alibaba themselves, and is not yet on aider's leaderboard. The aider team will probably do their own tests and maybe come up with a different score.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: