Absolutely humongous model. Mixture of 256 experts with 8 activated each time.
Aider leaderboard:
The only model above ๐ v3 here is ~~Open~~AI o1. DeepSeek is known to make amazing models and Aider rotates their benchmark over time, so it is unlikely that this is a train-on-benchmark situation.