Chatgpt - Seriously good potential (or just some Internet fun)

Ok so V3 on GitHub they have:

The DeepSeek-V3 weight file consists of two main components: Main Model Weights and MTP Modules.

1. Main Model Weights​

  • Composition:
    • Input/output embedding layers and a complete set of 61 Transformer hidden layers.
  • Parameter Count:
    • Total parameters: 671B
    • Activation parameters: 36.7B (including 0.9B for Embedding and 0.9B for the output Head).

Structural Details​

  • Embedding Layer:
    • model.embed_tokens.weight
  • Transformer Hidden Layers:
    • model.layers.0 to model.layers.60, totaling num_hidden_layers layers.
  • Output Layer:
    • model.norm.weight
    • lm_head.weight
Source: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/README_WEIGHTS.md

So they supply the full network.

The R1 GitHub entry is just documentation and no code from what I can see.
 
Last edited:
The model weights are also available. I expect the while model to be on AWS bedrock within a few months znd probably z guide to deploy it on Sagemaker within dzyd


yep,

now on AWS Bedrock



will go and play
 

The training commands below are configured for a node of 8 x H100s (80GB). For different hardware and topologies, you may need to tune the batch size and number of gradient accumulation steps.

So that's eight £20K rack mounts then..

Here {model} and {dataset} refer to the model and dataset IDs on the Hugging Face Hub, while {accelerator} refers to the choice of an Accelerate config file in configs.

So you'll need to get those from here: https://huggingface.co/docs/hub/en/datasets-overview you can then filter with 'R1' in the search box and you'll find a load of R1 based models (with no control or indication of quality or source).
 
Last edited:
My understanding is their full opensource provided the weights so you have a full text in-text out capability? Happy to be corrected but that is what I've seen discussed - I should have a look at the sources myself.

Yes, correct, open source is referring to the model (though doesn't necessarily, including in this case, include details of the dataset(s) used to train it) and issues with additional (potentially updated in future too) censorship models outside of the model itself to satisfy CCP rules or indeed privacy issues re: retention of data don't apply - a company literally could host the model themselves using consumer hardware even - doesn't necessarily need lots of expensive GPUs, lots of regular RAM + CPU would do the trick if using it internally rather than as part of the backend for serving many customers.

But also many others will be hosting it too including new modified/fine-tuned versions of the model (and smaller models distilled from it) - the AWS issue you mention (if I'm reading it right) that would seem to be something applicable to any LLM not just this particular one - but that seems to be more an issue with bad consultants and AWS, other hosting solutions are available - likewise, with closed source models accessed via an API (such as DeepSeek or indeed Anthropic or OpenAI) you have potential issues like bad developers/consultants inadvertently revealing API keys and some malicious people stealing from you/abusing your keys etc.. it's certainly happened already with small developers/startups but no doubt "consultant" types have or will make similar mistakes. There are different tradeoffs here, open source solutions like this certainly have their place and lots of the more obvious criticisms of this particular app are in part shaped by people initially using/testing the hosted/API offering from DeepSeek themselves and are easily mitigated, it doesn't take much to fine-tune this model or indeed use it for distillation of a smaller model.
 
Temporarily Switched to Claude Haiku & Concise Responses
Due to high demand, Claude 3.5 Sonnet is temporarily unavailable for free plans. We've switched to Claude 3.5 Haiku and briefer responses.

Own up .. lol.
 
Hehe, hehe, my thoughts exactly.


Be a robber baron but don't complain if someone does something similar to you.

You want a competitive market? You got a competitive market thanks to Trump ;)

Been working through the Claude.ai with my project - I have to say I'm impressed that it's not had a senile moment like ChatGPT has, only issue is I've bumped into the chat limit on a complex task so I may have to subscribe for a couple of months.
 

A complex problem that took microbiologists a decade to get to the bottom of has been solved in just two days by a new artificial intelligence (AI) tool.

Professor José R Penadés and his team at Imperial College London had spent years working out and proving why some superbugs are immune to antibiotics.

He gave "co-scientist" - a tool made by Google - a short prompt asking it about the core problem he had been investigating and it reached the same conclusion in 48 hours.

Pretty amazing
 
You want a competitive market? You got a competitive market thanks to Trump ;)

Been working through the Claude.ai with my project - I have to say I'm impressed that it's not had a senile moment like ChatGPT has, only issue is I've bumped into the chat limit on a complex task so I may have to subscribe for a couple of months.
I'm using qwen2.5 locally for coding assistance. Even the 1.5b model is pretty decent for autocompletion. Being able to interrogate a codebase is awesome.

Sure there's loads of gumbies out there trying to use AI as a substitute for knowledge, but used right, it's really useful assisting people that do already know what they're doing.
 
with the complaints from musicians about opting their music out from AI harvesting , had wondered what OC does with the forum data ?

nonethless musicians seem hung up on AI plagiarizing their living, without seeing the irony of much of the derivative work they create,
I think any true musical talent will still shine through , the streaming platform means new finds with genuine long term ability get picked up,
AI probably won't be able to generate the online para-social relationship persona though, which let's face it is why many people make money - not the music
 
with the complaints from musicians about opting their music out from AI harvesting , had wondered what OC does with the forum data ?

nonethless musicians seem hung up on AI plagiarizing their living, without seeing the irony of much of the derivative work they create,
I think any true musical talent will still shine through , the streaming platform means new finds with genuine long term ability get picked up,
AI probably won't be able to generate the online para-social relationship persona though, which let's face it is why many people make money - not the music
I tend to agree with this. The industry always talks about a formula, which is exactly what AI is good at, but I think the performer/personality still makes it work.

Many singers aren't necessarily the best singers, but they have the personality and looks that appeal to the audience.
 
AI probably won't be able to generate the online para-social relationship persona though, which let's face it is why many people make money - not the music

That is true and whilst there's always going to be the reality that the 'AI artist' wouldn't actually exist as a physical entity, you will have a lot of human input in terms of filling in the gap for that failing, as best they can. For some consumers, it would never be enough but at some basic level, good sound is good sound.

I think the concern from musicians is that their cut of the pie will rapidly decrease. In terms of financial most obviously but also relevancy and perhaps eventually legacy. The rate at which AI can pump out musical content compared to the tried and true methods is quite frightening.
 
Back
Top Bottom