DeepSeek dropped the V3.1 Weight (huggingface.co)

submitted 5 days ago by pepperfree@sh.itjust.works to c/localllama@sh.itjust.works

13 comments fedilink hide all child comments

Not what we expected...

top 13 comments

sorted by: hot top controversial new old

[-] deafboy@lemmy.world 15 points 5 days ago

Honestly, did the word "drop" change meaning in the past few months, or am I just crazy?

[-] terminatortwo@piefed.social 14 points 5 days ago* (last edited 5 days ago)

Not really, it’s just more common!

Drop is a contronym, it means its own opposite, and its use as “disappear” or “appear” extends waaay back. Eg. Usage as in Drop a line or drop a letter go as far as the 1700s.

So just a different line in a long history of drops.

https://www.oed.com/dictionary/drop_v

[-] vividspecter@aussie.zone 5 points 5 days ago* (last edited 5 days ago)

More like in the past 10 years or so, but yes. On a side note, "leaked" has also subtly changed meaning from the actual product (such as a game being released early through piracy etc) to just information about the product.

[-] bobo@lemmy.ml 5 points 5 days ago

"leaked" has also subtly changed meaning from the actual product (such as a game being released early through piracy etc) to just information about the product.

Just no... Information getting out despite attempts to conceal it, is the secondary meaning of that verb since at least the 19th century.

[-] vividspecter@aussie.zone 1 points 5 days ago

Thanks for the correction. The latter meaning did gain more prominence in recent years in my perception, but it's not surprising that it already existed in the past.

[-] afk_strats@lemmy.world 3 points 4 days ago

It picked up a lot of use by GenZ+ especially though the phrase "New {something} just dropped".

[-] SlartyBartFast@sh.itjust.works 1 points 2 days ago

Goddamn kids

[-] hummingbird@lemmy.world 3 points 5 days ago

It's not you

[-] fibojoly@sh.itjust.works 1 points 4 days ago

Depends where they dropped the thing. In your lap, or in a bin? Context, as always, is key.

[-] 0x01@lemmy.ml 7 points 5 days ago

DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Compared to the previous version, this upgrade brings improvements in multiple aspects:

Hybrid thinking mode: One model supports both thinking mode and non-thinking mode by changing the chat template.

Smarter tool calling: Through post-training optimization, the model's performance in tool usage and agent tasks has significantly improved.

Higher thinking efficiency: DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly.

The tool calling improvements are very welcome

[-] pepperfree@sh.itjust.works 1 points 5 days ago

I wonder if we can extend the context length. It already fine-tuned with YaRN so we can't get free extend with that method.

[-] FaceDeer@fedia.io 7 points 5 days ago

What's not what we expected?

[-] pepperfree@sh.itjust.works 1 points 5 days ago

Everybody been rumoring about R2. So releasing this thing kinda unexpected

this post was submitted on 21 Aug 2025

40 points (100.0% liked)

LocalLLaMA

3594 readers

1 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

founded 2 years ago

MODERATORS

SkySyrup@sh.itjust.works

pax@sh.itjust.works

noneabove1182@sh.itjust.works

Smokeydope@lemmy.world

MonsterBug@sh.itjust.works