Reasoning and planning

Reasoning is not one capacity. It is a stack of overlapping systems — fast pattern recognition, slow deliberation, model-based search, social inference, formal logic — that humans switch between, often unconsciously.

Dual-process theory

Daniel Kahneman's System 1 / System 2 framing, drawn from earlier dual-process accounts, has become a useful shorthand. System 1 is fast, automatic, intuitive; System 2 is slow, deliberate, effortful. Both are necessary; both are fallible in different ways.

Most expert performance is System 1: a chess grandmaster sees the right move before searching. Most error correction, novel problem solving, and explicit chains of reasoning are System 2.

Model-based and model-free decision-making

Reinforcement learning makes a related distinction. Model-free agents learn cached values by trial and error: this state is worth that much, take the action with the highest value. Model-based agents simulate forward using an internal model of the environment: 'if I do this, then probably that.'

Brains use both, often in parallel. The balance shifts: novel situations recruit model-based reasoning; over-learned situations drift to model-free habits.

Planning and search

Classical AI planning explicitly searches a tree of possible futures (AlphaGo style). Language models, by contrast, mostly do not search at the symbol level — they generate plausible continuations from learned priors. Chain-of-thought prompting and reasoning-trained models (o-series, R1-style) recover some search-like behaviour by spending more tokens on intermediate steps.

How close this gets to real planning is contested. Some researchers see modern reasoning models as genuine search-in-language; others see them as elaborate pattern completion with extra computation budget.

What the brain does that LLMs don't

Humans plan over much longer horizons, with much smaller working memory, and with explicit uncertainty representations. We backtrack, notice when a plan is bad, and adopt new strategies on the fly. Frontier LLMs are improving here but still struggle with very long horizons, executive control, and reliable self-correction.

Closing the planning gap is one of the most active research areas of 2026, with agents, tool use, and reasoning training all converging on the same goal.

Key terms

System 1 / System 2: Fast/automatic vs slow/deliberate cognitive processes.
Model-free: Decision-making by cached value estimates, no forward simulation.
Model-based: Decision-making by simulating forward with an internal world model.
Chain-of-thought: Prompting or training a model to produce intermediate reasoning tokens before its answer.
Executive function: The set of control processes that direct, monitor, and correct cognition.

Connects to AGI

A system that can pattern-match brilliantly but cannot reliably plan, backtrack, and self-correct is not AGI in any practically useful sense. The 2026 push into agents and reasoning models is precisely this gap, attacked from several sides.