120

DeepSeek: 'Cheap' Chinese chatbot shocks AI world (www.bbc.co.uk)

submitted 6 months ago by Espiritdescali@futurology.today to c/futurology@futurology.today

18 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] Glasgow@lemmy.ml 9 points 6 months ago

From someone in the field

It lowered training costs by quite a bit. To learn from preference data (whats termed as alignment with human values), we used a very large reward model as a proxy for human feedback.

They completely got rid of this, hence also the need to have very large clusters

This has serious implications for spending though. Big companies who would have to train foundation models coz they couldnt directly use meta's llama, can now just use deepseek.

and directly move to the human/customer alignment phase, which was already significantly cheaper than pretraining (first phase of foundation model training). With their new algorithm, even the later stage does not need huge compute

so they def got rid of a big chunk of compute by not relying on what is called a “reward” model

GRPO: group relative policy optimization

huggingface is trying to replicate their results

https://github.com/huggingface/open-r1

[-] CanadaPlus@lemmy.sdf.org 2 points 6 months ago* (last edited 6 months ago)

Unfortunately, that's not very clear without more. What kind of reward model are they talking about?

This is potentially a 1000x difference in required resources here, assuming you believe their DeepSeek's quoted figure for spending, so it would have to be an extraordinary change.

this post was submitted on 27 Jan 2025

120 points (100.0% liked)

Futurology

3097 readers

1 users here now

founded 2 years ago

MODERATORS

voidx@futurology.today

Lugh@futurology.today

Espiritdescali@futurology.today

AwesomeLowlander@futurology.today