Ok so V3 on GitHub they have:
So they supply the full network.
The R1 GitHub entry is just documentation and no code from what I can see.
Source: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/README_WEIGHTS.mdThe DeepSeek-V3 weight file consists of two main components: Main Model Weights and MTP Modules.
1. Main Model Weights
- Composition:
- Input/output embedding layers and a complete set of 61 Transformer hidden layers.
- Parameter Count:
- Total parameters: 671B
- Activation parameters: 36.7B (including 0.9B for Embedding and 0.9B for the output Head).
Structural Details
- Embedding Layer:
- model.embed_tokens.weight
- Transformer Hidden Layers:
- model.layers.0 to model.layers.60, totaling num_hidden_layers layers.
- Output Layer:
- model.norm.weight
- lm_head.weight
So they supply the full network.
The R1 GitHub entry is just documentation and no code from what I can see.
Last edited: