Chatgpt - Seriously good potential (or just some Internet fun)

Associate
Joined
10 Apr 2008
Posts
2,487
We are on the verge of a once in a century technological breakthrough, and we're going to ruin it by obsessing over 'diversity'. The future is going to belong to the countries able to create the best AI, and this Gemini crap is categorically detrimental to the entire West in that regard.
 
Caporegime
Joined
29 Jan 2008
Posts
58,913
We are on the verge of a once in a century technological breakthrough, and we're going to ruin it by obsessing over 'diversity'. The future is going to belong to the countries able to create the best AI, and this Gemini crap is categorically detrimental to the entire West in that regard.

One of the Google founders Serge has admitted that was a mistake when asked about it at a recent hackathon, not sure I can post the tweet/video here though as the guy who asked him about it is wearing a costume featuring a picture of a naked female torso...
 
Caporegime
Joined
23 Apr 2014
Posts
29,520
Location
Bell End, near Lickey End
It pretty much does indicate that though and there isn't a GPT4.5 turbo (yet), there's a turbo iteration of GPT4 and like I said those are public benchmarks - GPT4 turbo for example scores 92.5% on GSMK8, 54% on MATH and 73.17 on HumanEval see here:

And if you look at the anthropic's results GPT4 scores 92% 52.9% and 67% respectively for each of those, so turbo isn't a huge improvement.

But then if you look at Claude 3 it scores 95% 60.1% and 84.9% respectively for those tests.

No that doesn't follow, there have been plenty of other newer models released that didn't beat GPT4, beating GPT4 across a range of tasks is quite an achievement.

While that may be an indication, it remains to be seen in real world usage. We'll find out over the next few weeks just how much better it is in all categories.

Also, there haven't been many large-scale models released to the public since GPT4 Turbo. There have been some niche models that focus on specific tasks, but none as broad as GPT 4 turbo and Claude3. Gemini 1.5 pro isn't public yet.

There are already some rudimentary examples on Twitter and Reddit where it certainly doesn't look "better". There are similar discussions every time one of the big models is updated, but opinions tend to change about how good it really is when people start using it for real-world activities. They tend to get a bit lobotomised as people try to "break" them or point out the stereotypes they make, etc.

GPT4 is much more restrictive than it was at release, unless you use the playground or API and pay per token.

Anthropic's auto-detection system also seems to be broken when it comes to ToS violations, as many accounts have been banned in the last 24 hours for seemingly no reason - https://www.reddit.com/r/ClaudeAI/comments/1b76v1g/account_got_banned/ - https://www.reddit.com/r/ClaudeAI/s...56b8&iId=1ae0909e-9222-4ee3-a1a5-19bce4674c80
 
Last edited:
Caporegime
Joined
29 Jan 2008
Posts
58,913
While that may be an indication, it remains to be seen in real world usage. We'll find out over the next few weeks just how much better it is in all categories.

Oh for sure, no objections here to seeing what happens in terms of real-world usage. I was just pointing out that those aren't contrived benchmarks, they're standard open ones that the other models have already been tested under and so that does actually give a pretty solid indication of capabilities. And also this isn't some small startup, Claude 2 is already well established as a good model beyond just passing benchmarks well, I don't think there is much to be overly skeptical about here re: Claude 3. :)

edit

Also on the positive side re: ad-hoc testing, there is some very cool stuff so far with this model, these are a few of the things I bookmarked in the past couple of days:


The last example is a bit over hyped by some overly worried AI safety person but it's still pretty cool, the model knows it's being tested and spots the sentence.
 
Last edited:
Soldato
Joined
25 Nov 2005
Posts
12,454

High-powered chipmaker Nvidia has teamed up with artificial intelligence health care company Hippocratic AI to develop generative AI "agents" that not only outperform human nurses on video calls but cost a lot less per hour.

It's starting! I wonder how long it will be before we get AI nurses in the NHS where if you want a human nurse you have to go private ? Obviously some time off given these are only video call agents but it's the beginning of AI taking over jobs
 
Soldato
Joined
5 Apr 2009
Posts
6,056
Location
West Midlands



It's starting! I wonder how long it will be before we get AI nurses in the NHS where if you want a human nurse you have to go private ? Obviously some time off given these are only video call agents but it's the beginning of AI taking over jobs

Given the AI ones outperform the humans, I wouldn't be against having an AI one. Obviously that doesn't account for the "human touch" or bedside manner etc. But from a purely technical perspective, I'm for it.
 
Back
Top Bottom