Grok is now pretty good at answering Baldur’s Gate questions

By Febspot 21 Feb 2026 • 1 min read

Source: TechCrunch

Different AI labs have different priorities: OpenAI has focused on consumer users, Anthropic tends toward enterprises, and xAI has been placing particular emphasis on video-game walkthroughs. A recent detailed report described a model release delayed for several days after Elon Musk was unhappy with how the chatbot handled detailed questions about Baldur’s Gate.

High-level engineers were pulled from other projects to improve the responses before launch. That decision prompted a small test. Our resident RPG enthusiast, Ram Iyer, assembled five general Baldur’s Gate questions and ran them against Grok and the three major models in a quick benchmark dubbed “BaldurBench.” Chat transcripts for Grok, ChatGPT, Claude and Gemini were published alongside the experiment.

Grok delivered pretty good information overall. Its answers leaned on gamer jargon — “save-scumming” and “DPS” — and favored tables and theorycraft, but the guidance was useful and well informed if you knew the terms.

grok, baldur's gate, baldurbench, xai, elon musk, chatgpt, claude, gemini, walkthroughs, theorycraft