Chatgpt - Seriously good potential (or just some Internet fun)

Ok so V3 on GitHub they have:

The DeepSeek-V3 weight file consists of two main components: Main Model Weights and MTP Modules.

1. Main Model Weights​

  • Composition:
    • Input/output embedding layers and a complete set of 61 Transformer hidden layers.
  • Parameter Count:
    • Total parameters: 671B
    • Activation parameters: 36.7B (including 0.9B for Embedding and 0.9B for the output Head).

Structural Details​

  • Embedding Layer:
    • model.embed_tokens.weight
  • Transformer Hidden Layers:
    • model.layers.0 to model.layers.60, totaling num_hidden_layers layers.
  • Output Layer:
    • model.norm.weight
    • lm_head.weight
Source: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/README_WEIGHTS.md

So they supply the full network.

The R1 GitHub entry is just documentation and no code from what I can see.
 
Last edited:
The model weights are also available. I expect the while model to be on AWS bedrock within a few months znd probably z guide to deploy it on Sagemaker within dzyd


yep,

now on AWS Bedrock



will go and play
 
Im guessing any uni IT dept worth its salt will have downloaded a copy for staff / student use.
 

The training commands below are configured for a node of 8 x H100s (80GB). For different hardware and topologies, you may need to tune the batch size and number of gradient accumulation steps.

So that's eight £20K rack mounts then..

Here {model} and {dataset} refer to the model and dataset IDs on the Hugging Face Hub, while {accelerator} refers to the choice of an Accelerate config file in configs.

So you'll need to get those from here: https://huggingface.co/docs/hub/en/datasets-overview you can then filter with 'R1' in the search box and you'll find a load of R1 based models (with no control or indication of quality or source).
 
Last edited:
Back
Top Bottom