Release of DeepSeek-R1T-Chimera

May 2nd, 2025

On the weekend, we released DeepSeek-R1T-Chimera, an open weights model adding R1 reasoning to DeepSeek AI V3-0324. In benchmarks, it appears to be as smart as R1 but much faster, using 40% fewer output tokens.

We applied a novel construction method: the Chimera is a child LLM, using V3's shared experts augmented with a custom merge of R1's routed experts. It is not a fine-tune or distillation, but constructed from neural network parts of both parent MoE models.

Surprisingly, during our experiments, we did not detect defects of the hybrid child model. Instead, its reasoning and thinking processes appear to be more compact and orderly than the sometimes very long and wandering thoughts of the R1 parent model.

The model merge generated quite some attention in the community. It got picked up and hosted by Open Router, where it was temporarily ranked as the #2 trending model with more than 1 billion processed tokens so far.

You can try the model merge yourself.

The weights are on Hugging Face.

We want to thank DeepSeek for making the parent models V3 and R1 available and making such research possible.

,