Imagine sitting across the table from someone who can recite every chess rule perfectly, discuss famous games in detail, and analyze complex positions with apparent sophistication.
Then you actually play a game with the guy, and he confidently moves his queen like a knight.
“Hey,” you say, “that’s illegal!”
“Brilliant observation!” your opponent replies. “I can sometimes make errors. Sorry I didn’t follow the rules. Let me try again.”
Then he tries another illegal move three turns later. Two turns after that, he does it again.
“Wow,” you think, “this genius doesn’t really understand chess at all.”
This isn’t hypothetical. It’s what tends to happen when you play chess with most large language models (LLMs), like ChatGPT. Despite being trained on vast amounts of chess content, the AI tools still routinely make illegal moves.
AI researcher Gary Marcus argues this isn’t just a quirk—it’s evidence of a fundamental flaw in how LLMs “think.” They lack what cognitive scientists call world models: rich, internal representations of reality that track how things work in the world.
Large on Language. Short on Models.
It isn’t that LLMs lack chess knowledge. They know the rules in theory. But they don’t maintain a persistent internal model of the board or the state of play.
When humans play chess, we use mental models of the board. We can picture where the pieces sit, remember how they got there, and imagine potential moves (and countermoves!) in advance. We update our mental chessboards as the game progresses, and we don’t need much expertise to do it: Even a child learning chess quickly builds the ability to map and imagine the basic spatial representations and movements.
LLMs don’t do this. They’re like someone who has memorized thousands of conversations about chess but has never actually seen a board or sat across the table from another player. They process sequences of text tokens, recognize patterns in language, and generate plausible-sounding moves, but they don’t understand spatial relationships or follow the rules they can easily recite.
The result? Fluent, rule-derived commentary but nonsensical gameplay.
Beyond the Chess Board
Marcus argues this distinction between human and LLM thinking matters far beyond the chessboard: It reveals a fundamental difference between what LLMs excel at, which is basically statistical pattern matching, and the kind of structured, causal reasoning that enables human beings to understand and navigate the physical world.
When you reach for a cup of water, your brain predicts the cup’s weight, anticipates how your muscles need to respond, and adjusts your grip before you even touch it. When you hear a friend’s voice on the phone, you don’t just process sound patterns. You activate a rich model of that person, complete with their likely mood, current circumstances, and probable responses to what you might say. You rapidly model the world, yours and theirs.
Day by day, hour by hour, humans use such “world models” to track how events unfold over time. We construct causal models that represent how actions lead to consequences. We maintain social models that track other people’s beliefs, goals, and likely behaviors.
These models can be accurate or inaccurate, wise or foolish, grounded in fact or intermingled with fiction. But they aren’t just abstract tokens or words. They’re more like active simulations that we constantly consult and update. They’re stories based on world models and vice versa. And they enable several human superpowers:
- Mental time travel: the ability to simulate past events and imagine future scenarios.
- Counterfactual reasoning: the ability to ask, “What would have happened if I had started this article by writing about something other than chess?”
- Narrative interpretation: the ability to track characters and their motivations across time.
They also enable us to detect inconsistencies between our models and the real world we observe: “Wait, if LLMs can’t grasp the world well enough to play chess, how could we possibly trust them to make important decisions?”
No World Models, No General Intelligence?
If Marcus is right, then simply scaling up current approaches is unlikely to solve the fundamental reliability problems LLMs encounter. Building bigger models, training on more data, and adding more computing power won’t fix the flaw that prevents ChatGPT from understanding the world like we do.
LLMs are amazing tools. They’re great at working with patterns in language. They can help us summarize text, explore ideas, and access information in user-friendly, conversational ways. They can serve as powerful tools for augmenting human thinking and learning, both of which rely heavily on the efficient and effective use of words.
But LLMs aren’t reliable replacements for intelligent humans. They aren’t systematic reasoners that can be trusted to maintain coherent understanding across complex, multi-step problems. And they certainly aren’t the kind of “general intelligence” that can rapidly understand and navigate novel real-world situations the way experienced people can.
The AI community (including Marcus) is exploring new approaches—like combining statistical learning with symbolic reasoning or world-model components—to address the challenges LLMs face. Meanwhile, we should all remember this: ChatGPT may sound sophisticated, but it lacks a fundamental grasp of reality.
It can’t help cheating at chess because it doesn’t know how to know better. We should treat what it tells us with caution.
Discover more from Truths & Wonders
Subscribe to get the latest posts sent to your email.
