120

DeepSeek: 'Cheap' Chinese chatbot shocks AI world (www.bbc.co.uk)

submitted 6 months ago by Espiritdescali@futurology.today to c/futurology@futurology.today

18 comments fedilink hide all child comments

top 18 comments

sorted by: hot top controversial new old

[-] Lugh@futurology.today 56 points 6 months ago* (last edited 6 months ago)

DeepSeek buzz puts tech stocks on track for $1.2 trillion drop

Just a few months ago many American commenters thought their country was 'years ahead' of China when it came to AI dominance. That narrative has been blown out of the water.

[-] Sabata11792@ani.social 8 points 6 months ago

"There is no mote"

[-] CanadaPlus@lemmy.sdf.org 7 points 6 months ago* (last edited 6 months ago)

It was kind of a weird take to be so confident in, honestly. Technical information famously leaks like a sieve to China, and all they needed was a few gigabytes of weights to roll their own.

Anyway, I sure am glad my portfolio is elsewhere.

[-] Absaroka@lemmy.world 34 points 6 months ago

It is powered by the open-source DeepSeek-V3 model, which its researchers claim was developed for less than $6m - significantly less than the billions spent by rivals.

It'll be interesting to see if this model was so cheap because the Chinese skipped years of development and got a jump start by stealing tech from other AI companies.

[-] just_another_person@lemmy.world 18 points 6 months ago* (last edited 6 months ago)

It cost so little because all previous open source work was already done, and a lot of the research work had already been knocked out. Building models isn't the time consuming process it used to be, it's the training, testing, retraining loop that's expensive.

If you're just building a model that is focused on specific things-like coding, math, and logic-then you don't need large swathes of content from the internet, you can just train it on already solved, freely available information. If you want to piss away money on an LLM that also knows how many celebrities each celebrity has diddled, well that costs a lot more to make.

[-] Duke_Nukem_1990@feddit.org 16 points 6 months ago

Even if that was true, it's fair game. After all the OpenAI models etc. are entirely based on stolen content as well.

[-] Glasgow@lemmy.ml 9 points 6 months ago

From someone in the field

It lowered training costs by quite a bit. To learn from preference data (whats termed as alignment with human values), we used a very large reward model as a proxy for human feedback.

They completely got rid of this, hence also the need to have very large clusters

This has serious implications for spending though. Big companies who would have to train foundation models coz they couldnt directly use meta's llama, can now just use deepseek.

and directly move to the human/customer alignment phase, which was already significantly cheaper than pretraining (first phase of foundation model training). With their new algorithm, even the later stage does not need huge compute

so they def got rid of a big chunk of compute by not relying on what is called a “reward” model

GRPO: group relative policy optimization

huggingface is trying to replicate their results

https://github.com/huggingface/open-r1

[-] CanadaPlus@lemmy.sdf.org 2 points 6 months ago* (last edited 6 months ago)

Unfortunately, that's not very clear without more. What kind of reward model are they talking about?

This is potentially a 1000x difference in required resources here, assuming you believe their DeepSeek's quoted figure for spending, so it would have to be an extraordinary change.

[-] hash@slrpnk.net 12 points 6 months ago

Took a look at the license. It's a custom one but appears far more permissive than llama. Mostly just safety restrictions and things like no military use.

[-] Espiritdescali@futurology.today 11 points 6 months ago

The fact that stories like this are breaking into mainstream media just shows how much effect they will have on the world

[-] gandalf_der_12te@discuss.tchncs.de 8 points 6 months ago

yeah well obviously sending all your private data to OpenAI's servers is maybe not so much of a good offer than these tech bros thought it would be ... eventually self-hosting of AI will be a necessity or lots of companies aren't good use it seriously...

[-] Espiritdescali@futurology.today 1 points 6 months ago

How many local TPU's do you need to run these latest models locally?

[-] SoftestSapphic@lemmy.world 7 points 6 months ago

https://www.deepseek.com/

It's free

Aaaaaaannd it's already under a DDOS attack lol

[-] x00z@lemmy.world 1 points 6 months ago

I'm sure the story behind "cheap" is the same as that of Chinese metal. Sell it really cheap to conquer the markets with state subsidized money.

[-] CanadaPlus@lemmy.sdf.org 1 points 6 months ago* (last edited 6 months ago)

Maybe? I also suspect espionage, because it's be a relatively easy thing to steal and than finetune to look like your own thing. Cheaper labour is and was the main driver of China's growth, though - otherwise they wouldn't have the budget to subsidise much of anything.

[-] x00z@lemmy.world 1 points 6 months ago

I'm sure there's people already trying to figure out if it's a derivative.

[-] CanadaPlus@lemmy.sdf.org 1 points 6 months ago* (last edited 6 months ago)

Yes, for sure. If they did copy it, and did it well, though, there really won't be a way to tell. If you already have a working full set of weights finetuning a slightly different net is a much smaller job, compute-wise.

[-] LordKitsuna@lemmy.world 1 points 6 months ago

Filled with censors and limits just like gpt. Wake me up when there is a high quality model that isn't afraid to talk about tits and penises

this post was submitted on 27 Jan 2025

120 points (100.0% liked)

Futurology

3090 readers

18 users here now

founded 2 years ago

MODERATORS

voidx@futurology.today

Lugh@futurology.today

Espiritdescali@futurology.today

AwesomeLowlander@futurology.today