It is powered by the open-source DeepSeek-V3 model, which its researchers claim was developed for less than $6m - significantly less than the billions spent by rivals.
It'll be interesting to see if this model was so cheap because the Chinese skipped years of development and got a jump start by stealing tech from other AI companies.
It cost so little because all previous open source work was already done, and a lot of the research work had already been knocked out. Building models isn't the time consuming process it used to be, it's the training, testing, retraining loop that's expensive.
If you're just building a model that is focused on specific things-like coding, math, and logic-then you don't need large swathes of content from the internet, you can just train it on already solved, freely available information. If you want to piss away money on an LLM that also knows how many celebrities each celebrity has diddled, well that costs a lot more to make.
It lowered training costs by quite a bit. To learn from preference data (whats termed as alignment with human values), we used a very large reward model as a proxy for human feedback.
They completely got rid of this, hence also the need to have very large clusters
This has serious implications for spending though. Big companies who would have to train foundation models coz they couldnt directly use meta's llama, can now just use deepseek.
and directly move to the human/customer alignment phase, which was already significantly cheaper than pretraining (first phase of foundation model training). With their new algorithm, even the later stage does not need huge compute
so they def got rid of a big chunk of compute by not relying on what is called a “reward” model
Unfortunately, that's not very clear without more. What kind of reward model are they talking about?
This is potentially a 1000x difference in required resources here, assuming you believe their DeepSeek's quoted figure for spending, so it would have to be an extraordinary change.
It'll be interesting to see if this model was so cheap because the Chinese skipped years of development and got a jump start by stealing tech from other AI companies.
It cost so little because all previous open source work was already done, and a lot of the research work had already been knocked out. Building models isn't the time consuming process it used to be, it's the training, testing, retraining loop that's expensive.
If you're just building a model that is focused on specific things-like coding, math, and logic-then you don't need large swathes of content from the internet, you can just train it on already solved, freely available information. If you want to piss away money on an LLM that also knows how many celebrities each celebrity has diddled, well that costs a lot more to make.
Even if that was true, it's fair game. After all the OpenAI models etc. are entirely based on stolen content as well.
From someone in the field
https://github.com/huggingface/open-r1
Unfortunately, that's not very clear without more. What kind of reward model are they talking about?
This is potentially a 1000x difference in required resources here, assuming you believe their DeepSeek's quoted figure for spending, so it would have to be an extraordinary change.