
This AI just passed Turing test, the ultimate human benchmark
What's the story
OpenAI's GPT-4.5 model has passed a three-party version of the Turing test, a well-known test of machine intelligence.
In the test, participants interacted with a human and an AI at the same time and then identified which was which.
The AI was identified as the human 73% of the time when asked to take on a persona, far exceeding the random chance benchmark of 50%.
Testing process
Other AI models also tested
The study, which is pending peer review, also assessed Meta's LLaMa 3.1-405B model, OpenAI's GPT-4o model, and an early chatbot known as ELIZA, developed in the mid-1960s.
Cameron Jones, the lead author of the study and a researcher at UC San Diego's Language and Cognition Lab said "People were no better than chance at distinguishing humans from GPT-4.5 and LLaMa (with the persona prompt)."
Test explained
Understanding the Turing Test
The Turing test, proposed by British mathematician and computer scientist Alan Turing in 1950, is a way to assess machine intelligence.
It involves an AI having text-based conversations with a human interrogator who also converses with another human (out of sight).
If the interrogator fails to correctly identify which respondent is the computer and which is the human, it indicates the machine can think like a human.
Prompt impact
AI prompts significantly influence test outcomes
In this latest study, the researchers performed the Turing test on an online platform.
The AI models were prompted in two different ways: a "no-persona" prompt and a "persona" prompt.
The latter involved asking the AI to take on a specific persona, like that of a young person aware of internet culture.
These instructions greatly influenced the outcome of the tests.
Performance disparity
AI's performance varied with different prompts
Without persona prompting, GPT-4.5 only had a win rate of 36%, much lower than its Turing-trumping score of 73%.
GPT-4o, on the other hand, managed a convincing score of just 21% with no-persona prompts.
Surprisingly, the ancient ELIZA slightly beat OpenAI's flagship model with a success rate of 23%.
Test limitations
Turing test not definitive proof of AI's human-like thinking
Despite its importance, the Turing test isn't conclusive evidence that an AI thinks like a human.
As Google software engineer Francois Chollet pointed out in 2023, "It was not meant as a literal test that you would actually run on the machine — it was more like a thought experiment."
This emphasizes the complexities and nuances in assessing machine intelligence.
Future implications
'LLMs could substitute for people in short interactions'
Jones, however, thinks that the results of his research don't necessarily show LLMs are intelligent like humans.
"I think that's a very complicated question..." he said.
"More pressingly, I think the results provide more evidence that LLMs could substitute for people in short interactions without anyone being able to tell."