Brain vs transformer

Comparing brains and transformers is fashionable and usually done badly. Here is a careful comparison along the axes that actually matter, with honest acknowledgement of where we don't know.

Scale and compute

The human brain has roughly 86 billion neurons and 100–500 trillion synapses, running at about 20 watts. Estimates of its raw compute throughput vary by orders of magnitude — from 10^14 to 10^20 operations per second — depending on what counts as an 'operation'.

Frontier transformers in 2026 have 10^12 to low 10^13 parameters and run on data-centre clusters drawing megawatts. The energy gap is the most striking single comparison: brains are roughly six orders of magnitude more energy-efficient at their job than current AI is at its.

Plasticity

Brains learn continuously, online, mostly with local rules. Transformers learn offline in massive training runs and are then frozen at inference time. In-context learning and on-the-fly fine-tuning are very limited substitutes.

Continual learning without catastrophic forgetting is unsolved for large models and routine for brains.

Architecture

Transformers are wide, shallow, and feed-forward (with attention). Brains are deeply recurrent: cortical regions feed back to one another extensively. Recurrence supports persistent activity, dynamic gating, and time-extended computation that current transformer architectures fake with autoregressive token generation.

Mixture-of-experts and state-space architectures are quietly importing some of these biological features.

Sample efficiency

A child sees roughly 10–100 million words in the first decade of life. Frontier LLMs train on trillions. The gap is real and large. Brains close it by combining strong inductive priors, embodied multi-modal experience, active exploration, and episodic memory.

AI is starting to address each of these: better pretraining curricula, multimodal training, agentic data collection, and retrieval. None of it has closed the gap by more than a few orders of magnitude.

What this comparison does and doesn't tell us

The fact that brains beat transformers on energy, sample efficiency, and continual learning does not mean transformers are a dead end. It does mean current architectures will need substantial extensions before they match biology on those axes.

It also means brute-force scaling alone is an expensive bet. Most serious AGI roadmaps now combine scale with architectural and algorithmic innovation.

Key terms

Parameter: A learnable weight in an artificial neural network.
Inference: Running a trained model on new input; not learning.
Continual learning: Learning new tasks without forgetting old ones.
Recurrence: Neural connections that loop back, enabling time-extended computation.
Mixture-of-experts: Architecture where only a sparse subset of parameters fires per token.

Connects to AGI

The brain is the only existing general intelligence. It does not dictate how AGI must be built, but it does set hard reference points that any candidate AGI architecture should at least try to match or transcend.