Amazing!
...But I hope they don't remain so adversarial. Base deepseek (and Qwen 2.5) is Apache 2.0, like many other open models, and there is no shame is continue pretraining it instead of wasting millions trying to reinvent the wheel. Then you get the benefit of a more compatible tokenizer and existing support in backends, too.
Everything is kinda based on American code from Huggingface anyway. And deepseek (and other Chinese firms) are not hostile to collaboration, in fact they seem to like "having their cake and eating it," leaving their models relatively "uncensored" under the hood and squeaking by whatever restrictions they're under.
It'd be awesome if they do something like a bitnet model, or an "alternative attention" hybrid model, but few institutions have taken that financial risk so far.