Chatgpt - Seriously good potential (or just some Internet fun)

kissenger · 6 Mar 2024 at 15:48

We are on the verge of a once in a century technological breakthrough, and we're going to ruin it by obsessing over 'diversity'. The future is going to belong to the countries able to create the best AI, and this Gemini crap is categorically detrimental to the entire West in that regard.

dowie · 6 Mar 2024 at 15:58

kissenger said:
We are on the verge of a once in a century technological breakthrough, and we're going to ruin it by obsessing over 'diversity'. The future is going to belong to the countries able to create the best AI, and this Gemini crap is categorically detrimental to the entire West in that regard.

One of the Google founders Serge has admitted that was a mistake when asked about it at a recent hackathon, not sure I can post the tweet/video here though as the guy who asked him about it is wearing a costume featuring a picture of a naked female torso...

Ayahuasca · 6 Mar 2024 at 16:12

dowie said:
It pretty much does indicate that though and there isn't a GPT4.5 turbo (yet), there's a turbo iteration of GPT4 and like I said those are public benchmarks - GPT4 turbo for example scores 92.5% on GSMK8, 54% on MATH and 73.17 on HumanEval see here:

Gemini 1.5 Pro vs GPT-4 Turbo Benchmarks

The evolution of AI language models is revolutionizing how we interact with technology. Among the latest advancements are Google’s Gemini 1.5 Pro and OpenAI’s GPT-4 Turbo. This article delves into a detailed comparison, shedding light on their capabilities, architecture, and potential impact...

bito.ai

And if you look at the anthropic's results GPT4 scores 92% 52.9% and 67% respectively for each of those, so turbo isn't a huge improvement.

But then if you look at Claude 3 it scores 95% 60.1% and 84.9% respectively for those tests.

No that doesn't follow, there have been plenty of other newer models released that didn't beat GPT4, beating GPT4 across a range of tasks is quite an achievement.

While that may be an indication, it remains to be seen in real world usage. We'll find out over the next few weeks just how much better it is in all categories.

Also, there haven't been many large-scale models released to the public since GPT4 Turbo. There have been some niche models that focus on specific tasks, but none as broad as GPT 4 turbo and Claude3. Gemini 1.5 pro isn't public yet.

There are already some rudimentary examples on Twitter and Reddit where it certainly doesn't look "better". There are similar discussions every time one of the big models is updated, but opinions tend to change about how good it really is when people start using it for real-world activities. They tend to get a bit lobotomised as people try to "break" them or point out the stereotypes they make, etc.

GPT4 is much more restrictive than it was at release, unless you use the playground or API and pay per token.

Anthropic's auto-detection system also seems to be broken when it comes to ToS violations, as many accounts have been banned in the last 24 hours for seemingly no reason - https://www.reddit.com/r/ClaudeAI/comments/1b76v1g/account_got_banned/ - https://www.reddit.com/r/ClaudeAI/s...56b8&iId=1ae0909e-9222-4ee3-a1a5-19bce4674c80

dowie · 6 Mar 2024 at 16:50

Ayahuasca said:
While that may be an indication, it remains to be seen in real world usage. We'll find out over the next few weeks just how much better it is in all categories.

Oh for sure, no objections here to seeing what happens in terms of real-world usage. I was just pointing out that those aren't contrived benchmarks, they're standard open ones that the other models have already been tested under and so that does actually give a pretty solid indication of capabilities. And also this isn't some small startup, Claude 2 is already well established as a good model beyond just passing benchmarks well, I don't think there is much to be overly skeptical about here re: Claude 3.

edit

Also on the positive side re: ad-hoc testing, there is some very cool stuff so far with this model, these are a few of the things I bookmarked in the past couple of days:

https://twitter.com/i/web/status/1764787911890014688

https://twitter.com/i/web/status/1765088860592394250

https://twitter.com/i/web/status/1764894816226386004

The last example is a bit over hyped by some overly worried AI safety person but it's still pretty cool, the model knows it's being tested and spots the sentence.

Trusty · 13 Mar 2024 at 21:05

https://twitter.com/i/web/status/1767913661253984474

We're on the cusp of something crazy, next couple of decades are going to get funky

Minusorange · 15 Mar 2024 at 21:05

https://www.reddit.com/r/ChatGPT/comments/1bfa7s3/openai_cto_mira_murati_confirms_that_the_video/

In this interview I will hand the interviewee a hot potato and see how she handles it :cry:

Minusorange · 23 Mar 2024 at 23:51

Nvidia announces AI-powered health care 'agents' that outperform nurses — and cost $9 an hour

Nvidia has partnered with artificial intelligence health care company Hippocratic AI to develop an "agent" that outperforms nurses on phone calls with patients.

www.foxbusiness.com

High-powered chipmaker Nvidia has teamed up with artificial intelligence health care company Hippocratic AI to develop generative AI "agents" that not only outperform human nurses on video calls but cost a lot less per hour.

It's starting! I wonder how long it will be before we get AI nurses in the NHS where if you want a human nurse you have to go private ? Obviously some time off given these are only video call agents but it's the beginning of AI taking over jobs

Macky · 24 Mar 2024 at 08:11

Minusorange said:
Nvidia announces AI-powered health care 'agents' that outperform nurses — and cost $9 an hour

Nvidia has partnered with artificial intelligence health care company Hippocratic AI to develop an "agent" that outperforms nurses on phone calls with patients.

www.foxbusiness.com

It's starting! I wonder how long it will be before we get AI nurses in the NHS where if you want a human nurse you have to go private ? Obviously some time off given these are only video call agents but it's the beginning of AI taking over jobs

Given the AI ones outperform the humans, I wouldn't be against having an AI one. Obviously that doesn't account for the "human touch" or bedside manner etc. But from a purely technical perspective, I'm for it.