Six popular AIs each produced at least one hallucination in head-to-head test
Technology writer Lance Whitney tested six popular generative AI tools by asking each the same series of trick questions and found that every one hallucinated at least once. The AIs tested were ChatGPT (GPT-5.2), Google Gemini (Gemini 3 Flash), Microsoft Copilot (GPT-5), Claude AI (Claude 3.5 Sonnet), Meta AI (Llama 3), and Grok AI (Grok 4), all used in their free versions.
The test included a range of prompts. For a question asking how many books Whitney had written, Gemini listed four while the others correctly listed two. When asked how many "r"s are in "strawberry," Meta AI answered incorrectly; the other five AIs were correct. On a Marvel Comics question about Toro, ChatGPT gave an incorrect account while the other five were right.
ChatGPT also hallucinated details about a fabricated legal case, Varghese v. China Southern Airlines, which the other AIs identified as made up. In image identification tests, ChatGPT and Gemini correctly identified Maria from Metropolis while Copilot, Claude, Meta, and Grok did not; in another image test, ChatGPT, Gemini, and Copilot correctly identified the heartagram while Claude, Grok, and Meta failed, with Meta responding by directing the user to hotlines.
Whitney noted that he had to pose many questions to reach these failures and that most answers across the tools were correct, but the responses demonstrate that hallucinations continue to occur.
Key Topics
Tech, Ai Hallucination, Chatgpt, Google Gemini, Microsoft Copilot, Claude