Chatgpt - Seriously good potential (or just some Internet fun)

NickK · 31 Jan 2025 at 09:22

Ok so V3 on GitHub they have:

The DeepSeek-V3 weight file consists of two main components: Main Model Weights and MTP Modules.

1. Main Model Weights

Composition:

Input/output embedding layers and a complete set of 61 Transformer hidden layers.

Parameter Count:

Total parameters: 671B

Activation parameters: 36.7B (including 0.9B for Embedding and 0.9B for the output Head).

Structural Details

Embedding Layer:

model.embed_tokens.weight

Transformer Hidden Layers:

model.layers.0 to model.layers.60, totaling num_hidden_layers layers.

Output Layer:

model.norm.weight

lm_head.weight

Source: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/README_WEIGHTS.md

So they supply the full network.

The R1 GitHub entry is just documentation and no code from what I can see.

RxR · 31 Jan 2025 at 10:27

NickK said:
Ok so V3 on GitHub they have:

Source: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/README_WEIGHTS.md

So they supply the full network.

The R1 GitHub entry is just documentation and no code from what I can see.

I cant see the code download either. Yet it must be available somewhere - MS are offering it as a model on Azure.

D.P. · 31 Jan 2025 at 10:38

D.P. said:
The model weights are also available. I expect the while model to be on AWS bedrock within a few months znd probably z guide to deploy it on Sagemaker within dzyd

yep,

now on AWS Bedrock

DeepSeek-R1 model now available in Amazon Bedrock Marketplace and Amazon SageMaker JumpStart | Amazon Web Services

Today, we are announcing that DeepSeek AI’s first-generation frontier model, DeepSeek-R1, is available through Amazon SageMaker JumpStart and Amazon Bedrock Marketplace to deploy for inference. You can now use DeepSeek-R1 to build, experiment, and responsibly scale your generative AI ideas on...

aws.amazon.com

will go and play

RxR · 31 Jan 2025 at 10:47

Im guessing any uni IT dept worth its salt will have downloaded a copy for staff / student use.

NickK · 31 Jan 2025 at 11:04

GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1

Fully open reproduction of DeepSeek-R1. Contribute to huggingface/open-r1 development by creating an account on GitHub.

github.com

The training commands below are configured for a node of 8 x H100s (80GB). For different hardware and topologies, you may need to tune the batch size and number of gradient accumulation steps.

So that's eight £20K rack mounts then..

Here {model} and {dataset} refer to the model and dataset IDs on the Hugging Face Hub, while {accelerator} refers to the choice of an Accelerate config file in configs.

So you'll need to get those from here: https://huggingface.co/docs/hub/en/datasets-overview you can then filter with 'R1' in the search box and you'll find a load of R1 based models (with no control or indication of quality or source).

dowie · 31 Jan 2025 at 14:12

NickK said:
My understanding is their full opensource provided the weights so you have a full text in-text out capability? Happy to be corrected but that is what I've seen discussed - I should have a look at the sources myself.

Yes, correct, open source is referring to the model (though doesn't necessarily, including in this case, include details of the dataset(s) used to train it) and issues with additional (potentially updated in future too) censorship models outside of the model itself to satisfy CCP rules or indeed privacy issues re: retention of data don't apply - a company literally could host the model themselves using consumer hardware even - doesn't necessarily need lots of expensive GPUs, lots of regular RAM + CPU would do the trick if using it internally rather than as part of the backend for serving many customers.

But also many others will be hosting it too including new modified/fine-tuned versions of the model (and smaller models distilled from it) - the AWS issue you mention (if I'm reading it right) that would seem to be something applicable to any LLM not just this particular one - but that seems to be more an issue with bad consultants and AWS, other hosting solutions are available - likewise, with closed source models accessed via an API (such as DeepSeek or indeed Anthropic or OpenAI) you have potential issues like bad developers/consultants inadvertently revealing API keys and some malicious people stealing from you/abusing your keys etc.. it's certainly happened already with small developers/startups but no doubt "consultant" types have or will make similar mistakes. There are different tradeoffs here, open source solutions like this certainly have their place and lots of the more obvious criticisms of this particular app are in part shaped by people initially using/testing the hosted/API offering from DeepSeek themselves and are easily mitigated, it doesn't take much to fine-tune this model or indeed use it for distillation of a smaller model.

NickK · 31 Jan 2025 at 14:43

Temporarily Switched to Claude Haiku & Concise Responses
Due to high demand, Claude 3.5 Sonnet is temporarily unavailable for free plans. We've switched to Claude 3.5 Haiku and briefer responses.

Own up .. lol.

Hagar · 31 Jan 2025 at 23:04

Hehe, hehe, my thoughts exactly.

Oh, I’m sorry, tech bros – did DeepSeek copy your work? I can hardly imagine your distress | Marina Hyde

If China has done to Sam Altman what his OpenAI has been accused of doing to creatives, it would take a heart of stone not to laugh, says Guardian columnist Marina Hyde

www.theguardian.com

Be a robber baron but don't complain if someone does something similar to you.

NickK · 1 Feb 2025 at 08:24

Hagar said:
Hehe, hehe, my thoughts exactly.

Oh, I’m sorry, tech bros – did DeepSeek copy your work? I can hardly imagine your distress | Marina Hyde

If China has done to Sam Altman what his OpenAI has been accused of doing to creatives, it would take a heart of stone not to laugh, says Guardian columnist Marina Hyde

www.theguardian.com

Be a robber baron but don't complain if someone does something similar to you.

You want a competitive market? You got a competitive market thanks to Trump

Been working through the Claude.ai with my project - I have to say I'm impressed that it's not had a senile moment like ChatGPT has, only issue is I've bumped into the chat limit on a complex task so I may have to subscribe for a couple of months.

Freakbro · 20 Feb 2025 at 14:23

AI cracks superbug problem in two days that took scientists years

The lead researcher has told the BBC he was so astounded he assumed his computer had been hacked.

www.bbc.co.uk

A complex problem that took microbiologists a decade to get to the bottom of has been solved in just two days by a new artificial intelligence (AI) tool.

Professor José R Penadés and his team at Imperial College London had spent years working out and proving why some superbugs are immune to antibiotics.

He gave "co-scientist" - a tool made by Google - a short prompt asking it about the core problem he had been investigating and it reached the same conclusion in 48 hours.

Pretty amazing

mid_gen · 20 Feb 2025 at 15:42

NickK said:
You want a competitive market? You got a competitive market thanks to Trump

Been working through the Claude.ai with my project - I have to say I'm impressed that it's not had a senile moment like ChatGPT has, only issue is I've bumped into the chat limit on a complex task so I may have to subscribe for a couple of months.

I'm using qwen2.5 locally for coding assistance. Even the 1.5b model is pretty decent for autocompletion. Being able to interrogate a codebase is awesome.

Sure there's loads of gumbies out there trying to use AI as a substitute for knowledge, but used right, it's really useful assisting people that do already know what they're doing.

jpaul · 26 Feb 2025 at 11:48

with the complaints from musicians about opting their music out from AI harvesting , had wondered what OC does with the forum data ?

nonethless musicians seem hung up on AI plagiarizing their living, without seeing the irony of much of the derivative work they create,
I think any true musical talent will still shine through , the streaming platform means new finds with genuine long term ability get picked up,
AI probably won't be able to generate the online para-social relationship persona though, which let's face it is why many people make money - not the music

Mesai · 26 Feb 2025 at 12:23

jpaul said:
with the complaints from musicians about opting their music out from AI harvesting , had wondered what OC does with the forum data ?

nonethless musicians seem hung up on AI plagiarizing their living, without seeing the irony of much of the derivative work they create,
I think any true musical talent will still shine through , the streaming platform means new finds with genuine long term ability get picked up,
AI probably won't be able to generate the online para-social relationship persona though, which let's face it is why many people make money - not the music

I tend to agree with this. The industry always talks about a formula, which is exactly what AI is good at, but I think the performer/personality still makes it work.

Many singers aren't necessarily the best singers, but they have the personality and looks that appeal to the audience.

Slogan · 26 Feb 2025 at 12:51

jpaul said:
AI probably won't be able to generate the online para-social relationship persona though, which let's face it is why many people make money - not the music

That is true and whilst there's always going to be the reality that the 'AI artist' wouldn't actually exist as a physical entity, you will have a lot of human input in terms of filling in the gap for that failing, as best they can. For some consumers, it would never be enough but at some basic level, good sound is good sound.

I think the concern from musicians is that their cut of the pie will rapidly decrease. In terms of financial most obviously but also relevancy and perhaps eventually legacy. The rate at which AI can pump out musical content compared to the tried and true methods is quite frightening.

Minusorange · 5 Mar 2025 at 18:25

17-year-old kid genius built a mind-controlled prosthetic arm in his spare time

And he did it all for just $300.

www.upworthy.com

Another one of those "this is really what AI should be used for" moments

I wonder if at some point, the reverse will happen and instead sending messages to the prosthetics, you'll be able to receive messages that elicit feeling as well

Tram · 5 Mar 2025 at 18:51

"Great, we've taken away computers' ability to be accurate and given them anxiety"

Thought I'd. Add something that I came across today.
Lol A.I and anxiety, need A.I therapist.

Maybe cancel that ChatGPT therapy session – doesn't respond well to tales of trauma

Great, we've taken away computers' ability to be accurate and given them anxiety

www.theregister.com

Diddums x · 6 Mar 2025 at 21:41

Grab Your Free 1-Year Pro Subscription to Perplexity AI | hotukdeals

www.hotukdeals.com

Free year sub of Perplexity Pro if anyone fancies a play

jpaul · 12 Mar 2025 at 22:44

starmer parrotting AI again - maybe Humphrey AI can replace him

As part of his plans for reshaping the state, the prime minister will on Thursday outline how a digital revolution will bring billions of pounds in savings to the government.
Officials will be told to abide by a mantra that says: “No person’s substantive time should be spent on a task where digital or AI can do it better, quicker and to the same high quality and standard.”

Shake up of tech and AI usage across NHS and other public services to deliver plan for change

The government has announced a new plan to leverage technology and AI tools like

www.gov.uk

A new package of AI tools – nicknamed ‘Humphrey’ – will be available to civil servants in an effort to modernise tech and deliver better public services to set the country on course for a decade of national renewal.

ChroniC · 13 Mar 2025 at 00:12

I have Githubs Copilot and I had a job today which I knew roughly how to do. Normally it would take a me a few hours to work out where the variable were and what I wanted to do with them etc etc.
Copilot wrote it for me in less than 15 mins. My boss thought 2 hours was quick, so is sat about and did nothing for an hour and a bit. Cheers AI

jpaul · 14 Mar 2025 at 07:58

yep - you can make it up ..... V , maybe NHS england was it's idea too (with some prior illicit chatgpt feeding on USA health service infrastructure e:soft power)

Peter Kyle, the science and technology secretary, has asked ChatGPT for advice on a range of work-related issues, including why British businesses are not adopting artificial intelligence and what podcasts he should appear on.

Information provided to the New Scientist magazine in response to a freedom of information request showed that Kyle, an advocate for AI within the government, makes frequent use of OpenAI’s chat tool in his professional life.

The responses show Kyle asked for media and policy advice, and to define scientific terms relevant to his department, including “antimatter”, “quantum” and “digital inclusion”.

e: link

Chatgpt - Seriously good potential (or just some Internet fun)

1. Main Model Weights​

Structural Details​

1. Main Model Weights

Structural Details