In a preprint research paper titled “Does GPT-4 Pass the Turing Test?”, two researchers from UC San Diego pitted OpenAI’s GPT-4 AI language model against human participants, GPT-3.5, and ELIZA to see which could trick participants into thinking it was human with the greatest success. Along the way, the study, which has not been peer-reviewed, found that human participants correctly identified other humans in only 63 percent of the interactions-and that a 1960s computer program surpassed the AI model that powers the free version of ChatGPT. Even with limitations and caveats, which we’ll cover below, the paper presents a thought-provoking comparison between AI model approaches and raises further questions about using the Turing test to evaluate AI model performance. If the judge cannot reliably tell the chatbot from the human a certain percentage of the time, the chatbot is said to have passed the test.

Source: 1960s chatbot ELIZA beat OpenAIs GPT-3.5 in a recent Turing test study