An Artificial Intelligence Successfully Deceived Humans and Passed the Turing Test
The Turing test has been a crucial measure in assessing machine intelligence for a long time, and the latest LLM GPT-4.5 from OpenAI has just excelled at it. Researchers from the University of California San Diego suggest that current LLMs may have the capability to substitute for humans in short-term conversations, potentially leading to increased job automation and enhanced “social engineering attacks,” among other things.
While this accomplishment in engineering is impressive, it does not signify the attainment of artificial general intelligence (AGI). However, it does highlight that humans may be more easily deceived than previously believed. Back in 1950, during the emergence of the computing era, renowned British mathematician and computer scientist Alan Turing foresaw a time when machines would match the conversational skills of humans. To demonstrate this concept, Turing devised the Turing test to determine if a machine could mimic its human counterparts in language.
Over the years, the Turing test has been widely regarded as a significant benchmark for evaluating the capabilities of advanced computers and AI. In a recent experiment, participants were mistaken for a human 73% of the time by GPT-4.5, the newest OpenAI large language model (LLM), far surpassing the 50% chance rate. Scientists from the University of California (UC) San Diego published a paper on the results of this test on the preprint server arXiv last month.
The study’s authors stated, “The outcomes represent the initial empirical proof that an artificial system passes a conventional three-party Turing test.” These findings have implications for discussions on the type of intelligence displayed by LLMs and the probable social and economic effects these systems may have.
While GPT-4.5’s performance is remarkable, it utilized specific tactics to appear human. The LLM was instructed to adopt a “humanlike persona,” resulting in responses filled with internet jargon and socially awkward cues. With this persona, the LLM achieved the highest success rate, but without it, GPT-4.5 was less convincing, scoring only 36%.
The experiment involved a three-party test where participants conversed with both a human and AI simultaneously, attempting to distinguish between the two. Cameron Jones, a study co-author, described this test, lasting approximately five minutes, as the “most universally accepted standard” version of the Turing test.
Although passing the Turing test is a remarkable achievement, it does not indicate the creation of artificial general intelligence (AGI), the ultimate goal in the realm of AI. The Turing test evaluates a specific form of intelligence, and some argue that humans possess various distinct intelligences, including interpersonal, intrapersonal, visual-spatial, and existential abilities. This is one reason why some view the Turing test as somewhat outdated.
Nonetheless, some believe that this milestone reveals more about human behavior than it does about LLMs. The study points out that many participants selected GPT
The dynamic has shifted: what was once a examination of machines has now become an assessment of ourselves. However, we are not faring well in this new paradigm. Evaluating humanity no longer hinges on intellectual depth but on emotional impact. Our judgments are now driven by gut feelings and vibes, making our discernment vulnerable. Leveraging this vulnerability, LLMs, particularly when tailored to individuals, can exploit it with remarkable precision.
While this test does not signify the long-anticipated singularity where artificial intelligence surpasses human capabilities, Jones suggested that LLMs may now effectively stand in for humans in brief conversations. This development could lead to job automation, more sophisticated social manipulation tactics, and broader societal disruptions.
Given these possibilities, it is crucial, now more than ever, to regulate AI advancement or at least approach it with great care. Unfortunately, the U.S. government currently lacks the will to curb AI’s increasingly human-like aspirations.