NoticeThis site demonstrates one possible use of this domain. For acquisition, partnership, or investment inquiries, please use our contact form.

AGI Fundamentals

Understanding Emergent Capabilities in Large Models

Large models sometimes acquire abilities at scale that smaller versions did not have. Emergence is real but contested, and understanding it matters for forecasting AGI.

fig / emergent capabilities// field plate
Human and AI working at a shared workbench
Plate / Capabilities that appear suddenly rather than gradually.

Executive summary

Emergent capabilities are abilities that appear in larger models but were absent — or near random — in smaller versions of the same architecture. Examples include in-context learning, chain-of-thought reasoning, and instruction following. Whether emergence is a genuine phase transition or an artifact of metric choice is actively debated.

Key concepts

  • Phase transitions
  • In-context learning
  • Chain-of-thought reasoning
  • Scaling laws
  • Benchmark metric effects

What emergence looks like

Wei et al. (2022) catalogued capabilities that appeared abruptly at scale: arithmetic over multi-digit numbers, multi-step logical reasoning, instruction following from a few examples, and translation between rare language pairs. Below a threshold model size, performance was near random; above it, performance jumped sharply.

The contested view

Schaeffer et al. (2023) argued some emergent jumps disappear when the evaluation metric is changed from a discontinuous one (exact match) to a continuous one (token-level probability). On this view, capability often grows smoothly and the apparent jump is measurement, not mechanism. The debate is unresolved — some capabilities still appear discontinuous on any reasonable metric.

Why it matters for AGI

If important capabilities emerge unpredictably at scale, forecasting AGI becomes much harder. A model trained next year could acquire capabilities not present in this year's models without anyone predicting it in advance. This is one of the main arguments for pre-deployment evaluation regimes.

Key takeaways

  • 01Some capabilities appear sharply at scale.
  • 02Others appear sharp only because of how we measure them.
  • 03Either way, unpredictable jumps complicate forecasting.
  • 04Pre-deployment evaluation is one response.

Frequently asked questions

Is in-context learning emergent?

It was first identified as emergent in GPT-3 and remains one of the clearest examples of a capability that scaled non-trivially with size.

Does emergence guarantee AGI?

No. It explains some capability surprises but does not show that all required AGI capabilities will simply appear with scale.