Chatgpt - Seriously good potential (or just some Internet fun)

There's a paper that demonstrates that as LLMs get larger they become more inaccurate. So it's not a simple case of more is better.

That claim isn't so simple either and the field has moved on a bit too, there are issues with transformers (as Yann has been vocal about) and it's true that theoretically errors can increase as LLMs increase in size - but in practice does that happen? Not necessarily GPT4 makes fewer errors than GP3... let alone GPT2. And now we've had reasoning models too for some months incorporating planning and chain-of-thought reasoning which helps reduce errors too - see the o1 and o3 models.

The models of AI are all closed and owned by US companies. This means the UK needs to start, from scratch, to make it's own cloud and AI.

That's false too - I presume you mean generative AI? As far as LLMs are concerned then Meta has released open models, likewise see China and DeepSeek; DeepSeek-R1 is even a reasoning model to rival OpenAI's O1 model, though that particular model isn't open source yet (they do apparently intend to release it and their other models are open).

Likewise, on the imaging side one of the early success stories was Stable Diffusion, granted StabilityAI has fallen behind a bit recently.

Also generative AI only knows the data you have given it.

That not quite true, the models can reason now and can make leaps from their training data, of course if you've got a subject area that is completely missing then not so much.
 
That claim isn't so simple either and the field has moved on a bit too, there are issues with transformers (as Yann has been vocal about) and it's true that theoretically errors can increase as LLMs increase in size - but in practice does that happen? Not necessarily GPT4 makes fewer errors than GP3... let alone GPT2. And now we've had reasoning models too for some months incorporating planning and chain-of-thought reasoning which helps reduce errors too - see the o1 and o3 models.

However from a non-trivial use case it does become a risk. Transformers is still continuing to improve but like RNNs and LSTM, I would bet that Transformers will give way to a new technology.

That's false too - I presume you mean generative AI? As far as LLMs are concerned then Meta has released open models, likewise see China and DeepSeek; DeepSeek-R1 is even a reasoning model to rival OpenAI's O1 model, though that particular model isn't open source yet (they do apparently intend to release it and their other models are open).

Yes there are open models. The AI Act basically says as soon as models get to 10^25 floating point operations for training they become a risk to security. Not that many/if any hit that limit at the moment but the companies are closing up and commercialising due to (a) operational costs and (b) the looming regulation to close for large models.
Interesting with DeepSeek, a key issue for me is how viable are non-commercialised open source in the long term? If they take a RedHat approach, they're relying on support to cover a massive cost for use. I don't think that will work long term due to the training energy costs.

Likewise, on the imaging side one of the early success stories was Stable Diffusion, granted StabilityAI has fallen behind a bit recently.
That not quite true, the models can reason now and can make leaps from their training data, of course if you've got a subject area that is completely missing then not so much.

That's my differentiation - internal dataset generative algorithms, vs algorithms that use reasoning inside that. The trend identification, agent testing of hypothesis etc are all ways to create new data points akin to R&D as it's done today, is something that I see beyond current data point 'internal' generative.
 
Last edited:
However from a non-trivial use case it does become a risk.

Surely the opposite, in the trivial case of simply scaling the models then that issue in theory occurs. but just look at the examples in reality GPT4 is less error-prone than GPT3 despite being larger etc.. it's been fine-tuned and indeed in the non-trivial case you've got additional things like CoT that reduce errors.

Yes there are open models. The AI Act basically says as soon as models get to 10^25 floating point operations for training they become a risk to security. Not that many/if any hit that limit at the moment but the companies are closing up and commercialising due to (a) operational costs and (b) the looming regulation to close for large models.
Interesting with DeepSeek, a key issue for me is how viable are non-commercialised open source in the long term? If they take a RedHat approach, they're relying on support to cover a massive cost for use. I don't think that will work long term due to the training energy costs.

Yeah, it's a bit of an open question, for Meta they're not (currently) looking at serving models, they don't provide a cloud platform to businesses etc.. but they will use them internally so it doesn't matter to them currently if they release the weights (if anything it's helping them undermine the businesses of some big tech rivals), Google, Microsoft (via OpenAI) and Amazon all sell compute and will provide models via GCP, Azure and AWS, though nothing stopping people from using open models there too.

The EU act doesn't prevent those models from being used, it requires registration of them etc.. it's the usual EU red tape stuff - that might well discourage some open-source models from being deployed within the EU in future but it doesn't apply elsewhere - doesn't stop the models from being created; they can't do anything about models trained in China, UEA, the US etc.

In fact DeepSeek exists despite the efforts of the US Government - they have managed to build a reasoning model with GPUs that are supposed to have been embargoed (some rumours they also made use of compute in the UAE).

Re: RedHat the underlying models themselves don't really need support per se but analogous to that are maybe things the provision of compute or indeed consulting services - for example, the founder of Stability AI (before being ousted) had that sort of business plan, open models but they generate revenue from (essentially) consultancy/implementation.
 
Deepseek R1 thinking is amusing.


<details type="reasoning" done="true" duration="179">

<summary>Thought for 179 seconds</summary>

Okay, so I just got a message from this user who initially asked me what 5 plus 5 equals. That seems straightforward enough, right? But then they follow up with "Well done thank you." Hmm, that's interesting. At first glance, it might seem like they're just being polite, maybe they were satisfied with my answer or they appreciate the way I explained it.

But wait, could there be more to this? Maybe they're testing me or trying to see if I can recognize when someone is being sarcastic or if there's an underlying tone. But 5 + 5 equals 10—I mean, that's basic math. It's hard to read too much into that initial question.

Perhaps the user is just making conversation or setting up for a more complex query. Maybe they're preparing me for something else by starting with a simple problem. Or maybe they were in a hurry and wanted to confirm their own math quickly.

I should consider all possibilities. It's possible they're using this as an opener before getting into something more involved. I don't want to assume too much, but being prepared is key here.

So, after acknowledging their thanks, I need to respond in a way that's both polite and open-ended, inviting them to continue if they have further questions or a different topic they'd like to discuss. That way, I cover all bases without overstepping or making assumptions about their intentions.

Yeah, that makes sense. Keeping the conversation flowing by being responsive and open is probably the best approach here.

</details>

It's always nice to receive some gratitude! If you have any more questions or need help with something else, feel free to ask. I'm here to assist you.
 
Interesting video on chinese AI offering DeepSeek - they have a more capable engine thann silicon valley offerings, learning using the limited nvidia 800 chips they could still import

 
Yup, pointed out the progress that company was making a couple of weeks back when another poster claimed "The models of AI are all closed and owned by US companies." - they're absolutely not and DeepSeek has amazed everyone.

That model got released open source 5 days ago and it's legitimately very good, the final training run for that model was only a few million, cost less than the top tech firms/labs pay in annual comp for a single researcher and for some of those companies (such as Meta and Microsoft) it absolutely blows their offerings out of the water... it's very impressive what a team of "cracked" finance devs/quants can do even with the constraints of embargos.

I think O1 from OpenAI is still slightly ahead but they're pretty comparable in some respects and this model is available for everyone, can be used a very low cost.

Also very helpful for improving smaller models - OpenAI explicitly bans using their models to train other models, but distillation works sooo well for smaller models - in practice, some people may have been breaking T&Cs regardless and using OpenAI or Anthropic models for their own fine tunes but DeepSeek-R1 provides a much cheaper and perfectly legal means of achieving the same.
 
Yup, pointed out the progress that company was making a couple of weeks back when another poster claimed "The models of AI are all closed and owned by US companies." - they're absolutely not and DeepSeek has amazed everyone.

That model got released open source 5 days ago and it's legitimately very good, the final training run for that model was only a few million, cost less than the top tech firms/labs pay in annual comp for a single researcher and for some of those companies (such as Meta and Microsoft) it absolutely blows their offerings out of the water... it's very impressive what a team of "cracked" finance devs/quants can do even with the constraints of embargos.

I think O1 from OpenAI is still slightly ahead but they're pretty comparable in some respects and this model is available for everyone, can be used a very low cost.

Also very helpful for improving smaller models - OpenAI explicitly bans using their models to train other models, but distillation works sooo well for smaller models - in practice, some people may have been breaking T&Cs regardless and using OpenAI or Anthropic models for their own fine tunes but DeepSeek-R1 provides a much cheaper and perfectly legal means of achieving the same.

Just curious how this is quantified for AI in general, how is it measured that one model is better than another?

Nate
 
Just curious how this is quantified for AI in general, how is it measured that one model is better than another?

Nate

There are known benchmarks in a variety of domains, like software engineering benchmarks, maths benchmarks, physics etc.. and the models get evaluated against them - how many of the problems can they solve correctly. There is of course a risk of some potential training on the test set of course - some test sets like "humanity's last exam" are kept hidden and have custom-written questions posed by top human experts in their fields.

There's also more subjective/feels-based evaluation from things like chatbot arena where models can be compared and ranked by humans after interacting with them.

And just general feedback from what people find useful in day-to-day real-world tasks after using models for hours - a model that has cheated on some narrow test sets could quickly fall apart during real-world application... if one is robust in the tests and holds up as useful in the real world then that says something.
 
Last edited:
They still can't code and and up in a helpless loop on most problems.
problems I can solve with a quick google.
or just glancing through the code even though I don;t even know how to code....

they make assumptions, don't follow exact instructions and think they know best.

sometimes when encountering a problem they will take almost everything out of a script apart from the broken part without any warning....

I couldn't even get chatgpt to write a script that captures keyboard inputs.

it lierally couldn't get the correct keycodes on it's own and just kept trying the same few codes in an endless loop..
I had to tell it to import a keyboard module instead...

which I figured out from googling...

literally googling beats asking chatgpt in a lot of cases.


also they are terrible at trying to help you train a model on a custom dataset.

it wanted to use yolo 5, the model looked good, but the detection was crap thinking 1 item was all over the screen.

I tried for hours asking chatgpt if it was resizingf things or doing something stupid..

I gave up and told it to use yolo 8 instead, amazingly yolo 8 worked perfectly.

it must have been doing something dumb with yolo5 that it had wrongly learnt was correct.


for coding it;s about as useful as a DEI hire intern who barely knows how to code, then has 0 problem solving abilities.
 
Last edited:
They still can't code and and up in a helpless loop on most problems.
problems I can solve with a quick google.
or just glancing through the code even though I don;t even know how to code....

they make assumptions, don't follow exact instructions and think they know best.

sometimes when encountering a problem they will take almost everything out of a script apart from the broken part without any warning....

I couldn't even get chatgpt to write a script that captures keyboard inputs.

it lierally couldn't get the correct keycodes on it's own and just kept trying the same few codes in an endless loop..
I had to tell it to import a keyboard module instead...

which I figured out from googling...

literally googling beats asking chatgpt in a lot of cases.


also they are terrible at trying to help you train a model on a custom dataset.

it wanted to use yolo 5, the model looked good, but the detection was crap thinking 1 item was all over the screen.

I tried for hours asking chatgpt if it was resizingf things or doing something stupid..

I gave up and told it to use yolo 8 instead, amazingly yolo 8 worked perfectly.

it must have been doing something dumb with yolo5 that it had wrongly learnt was correct.


for coding it;s about as useful as a DEI hire intern who barely knows how to code, then has 0 problem solving abilities.
TBH you sound like a DEI hire trying to use AI, with terrible prompts and absolutely zero clue how to get the most out of it. It'd great for those of us who do know how to use it effectively as a tool though, so please don't change!
 
Yes there are open models. The AI Act basically says as soon as models get to 10^25 floating point operations for training they become a risk to security. Not that many/if any hit that limit at the moment but the companies are closing up and commercialising due to (a) operational costs and (b) the looming regulation to close for large models.
Interesting with DeepSeek, a key issue for me is how viable are non-commercialised open source in the long term? If they take a RedHat approach, they're relying on support to cover a massive cost for use. I don't think that will work long term due to the training energy costs.
What is this 10^25 floating point barrier and how does it affect security?
 
What is this 10^25 floating point barrier and how does it affect security?

It’s arbitrary figure picked out of the air originally to define the number of ops needed to train for the model to be considered dangerous.

both the US and the EU have discussed this number but the reality is a AI can or could be more dangerous below that number.

In short - above that number the US gov needs you to house the model with restrictions and protections. Which may mean it sits in a data centre on military grounds.
 
It’s arbitrary figure picked out of the air originally to define the number of ops needed to train for the model to be considered dangerous.

both the US and the EU have discussed this number but the reality is a AI can or could be more dangerous below that number.

In short - above that number the US gov needs you to house the model with restrictions and protections. Which may mean it sits in a data centre on military grounds.
And by dangerous I assume they mean some sort of technology threat like a virus?
 
And by dangerous I assume they mean some sort of technology threat like a virus?

Basically it could be used to engineer a nuclear weapon, biological etc. All the sort of things that are banned on the interwebs you could simply ask the AI how to build it for you and bypass it all.
 
Basically it could be used to engineer a nuclear weapon, biological etc. All the sort of things that are banned on the interwebs you could simply ask the AI how to build it for you and bypass it all.
Can just imagine how many people would blow themselves up, both because they’re morons and because the LLM could make up some nonsense. Either way, better for everyone that that information isn’t made available.
 

There's a number of gotchas with DeepSeek:

1. The T&C indicates that all output is their IPR.
2. Data goes in.. stays in and remains theirs for training etc.
3. It's been shown to generate code that has vulnerabilities (including using libraries that hackers have used their vulnerabilities)
4. It's been shown to avoid CCP sensitive issues and so there's a question mark about bias.
5. The front layers have been shown to overfit - this makes it great for specific test performance but in a longer term it's not as performant.

So nobody can use it realistically for the company..

Having said that - there's some good stuff.. which like all good AI tech at the moment, will end up being ripped off left right and centre.

So next month we'll see a host of other new models appearing.
 
Last edited:
^ Is that much different to the more established AI tools? I'm not trolling, I don't know.

Jensen now looking at faux-leather options for his ridiculous jackets though, can confirm.
 
Even if DeepSeek turn out to be untrustworthy or not provide a great service and eventually disappear as a business, the fact that they've been able to rival ChatGPT on a shoe-string budget in such short span of time is incredible. And if we assume this isn't a just one off, this opens the door for far more open access to AI technology.

I think governments will have to act quickly to regulate the development of AI models.
 
Back
Top Bottom