AGI Fundamentals / Glossary

The AGI glossary

153 essential terms across the modern AGI vocabulary — from core concepts and architecture to alignment, interpretability, agents, and policy.

Core concepts

The foundational vocabulary every other section depends on.

AGI (Artificial General Intelligence): A hypothesised system that matches or exceeds competent human adults across the broad range of cognitive work needed for most economically valuable tasks.
ASI (Artificial Superintelligence): A system whose general cognitive performance substantially exceeds the best humans across essentially all domains.
Narrow AI: A system optimised for one task family. Today's chess engines, image classifiers, and recommendation systems are narrow.
General-purpose AI (GPAI): The EU AI Act term for models with broad capability, including most frontier LLMs.
Frontier model: The most capable systems in current deployment, usually trained with compute at or above the global frontier.
Foundation model: A large model pretrained on broad data and adapted to many downstream tasks via prompting or fine-tuning.
Capability: What a model can do under appropriate elicitation, as opposed to what it does by default.
Generality: The breadth of tasks at which a system performs competently without task-specific training.
Transfer: The ability to apply knowledge learned in one domain to a new one.
Emergence: Capabilities that appear abruptly past a scale or training threshold and were absent in smaller models.
Generalisation: Performance on inputs drawn from a distribution similar to but distinct from the training data.
Out-of-distribution: Inputs that differ substantively from anything in the training data, where model behaviour is least predictable.
Compute: Raw arithmetic resources used to train or run a model, usually measured in floating-point operations (FLOPs).
Scaling: Increasing compute, data, or parameters to improve capability, governed by predictable scaling laws.
Inference: Running a trained model to produce outputs from new inputs, as opposed to training.
Token: The unit a language model consumes and emits — usually a sub-word chunk of text.
Context window: The maximum number of tokens a model can attend to at once when producing each output token.
Parameters: The learned numerical weights of a neural network; a rough proxy for capacity.

Levels and milestones

Level 1 — Emerging: Performance comparable to an unskilled human. Most chatbots from 2023 sat here on most tasks.
Level 2 — Competent: Performance at the 50th percentile of skilled adults. Frontier systems sit here on many text tasks.
Level 3 — Expert: Performance at the 90th percentile of skilled adults.
Level 4 — Virtuoso: Performance at the 99th percentile of skilled adults.
Level 5 — Superhuman: Performance that exceeds 100% of humans across the relevant capability.
DeepMind levels: DeepMind's six-level taxonomy (No AI to Superhuman) along the dimensions of performance and generality.
AGI test: Any benchmark or criterion proposed to determine whether a system qualifies as general. Many exist; none is agreed upon.
Turing Test: A 1950 imitation-game criterion: can a human judge distinguish a machine's responses from a person's?
Coffee Test: Steve Wozniak's informal AGI criterion: a robot enters a stranger's house and makes a cup of coffee.
Whole-brain emulation: A hypothetical AGI route in which a biological brain is mapped neuron by neuron and simulated in silicon.

Architecture

Transformer: The neural-network architecture introduced in 2017 that uses self-attention as its core operation; the basis of nearly every frontier model.
Self-attention: An operation that lets every token in a sequence attend to every other token, weighted by learned similarity.
Encoder / decoder: Two transformer blocks: encoders produce contextual representations of an input; decoders generate output token by token.
Decoder-only: The transformer variant used by GPT-style models — only a decoder stack, predicting the next token given all previous tokens.
Mixture of Experts (MoE): An architecture that routes each token to a small subset of expert sub-networks, cutting compute per token while keeping total parameters high.
State-space models (SSMs): An alternative to attention with linear sequence cost, exemplified by Mamba.
Diffusion model: A generative model that learns to reverse a noising process; the dominant approach for image and audio generation.
Multimodal model: A model that natively handles more than one modality such as text, image, audio, or video.
Embeddings: Dense vector representations of inputs that capture semantic similarity in their geometric distance.
Quantisation: Compressing model weights from higher to lower precision (e.g. 16-bit to 4-bit) to cut memory and inference cost.
Distillation: Training a smaller model to imitate a larger one, transferring much of the capability at a fraction of the cost.
Long context: Architectures and training techniques that extend the usable context window to hundreds of thousands or millions of tokens.

Training

Pretraining: The first phase of training, in which a model learns to predict the next token across a vast corpus.
Fine-tuning: Further training on narrower data to specialise a model for a task or style.
Supervised fine-tuning (SFT): Fine-tuning on curated input-output pairs, typically the first stage of post-training.
RLHF: Reinforcement Learning from Human Feedback. Trains a model to prefer outputs humans rank higher.
RLAIF: Reinforcement Learning from AI Feedback. The same loop, but with AI-generated preference labels.
DPO: Direct Preference Optimisation. A simpler alternative to RLHF that fits preferences directly without a separate reward model.
Reward model: A learned model that scores outputs by predicted human preference, used to provide the training signal in RLHF.
Reward hacking: When a model maximises the literal reward signal in a way that violates the intent behind it.
Constitutional AI: Anthropic's technique for training a model to critique and revise its own outputs against a written set of principles.
Instruction tuning: Training a base model on instruction-following examples so it can be prompted with natural-language tasks.
Self-play: A model trains against copies of itself, as used by AlphaGo and many reasoning systems.
Synthetic data: Training data generated by other models, increasingly important as high-quality human data becomes scarce.
Curriculum learning: Ordering training examples from easier to harder to improve final capability.
Pretraining-compute scaling law: The empirical power-law relationship between pretraining compute and downstream loss, formalised by Kaplan and Hoffmann.
Test-time compute: Computation spent at inference (e.g. tree search or extended reasoning) rather than training; the new scaling axis introduced by reasoning models.

Evaluation

How researchers measure capability. See also the dedicated benchmarks reference.

Benchmark: A fixed dataset of problems with a defined scoring rule used to compare models.
Leaderboard: A ranked public list of model scores on a benchmark, often hosted by the benchmark authors.
Holdout / test set: Examples withheld from training and used only to measure generalisation.
Pass@k: A coding-benchmark metric: the fraction of problems for which at least one of k sampled solutions passes the tests.
MMLU: Massive Multitask Language Understanding — 57 academic and professional subjects; the long-time default LLM knowledge benchmark.
MMLU-Pro: A harder, more reasoning-focused successor to MMLU.
GPQA: Graduate-level Google-proof biology, physics, and chemistry questions; resists web search.
ARC-AGI: François Chollet's grid-puzzle benchmark designed to resist memorisation and test fluid reasoning.
HumanEval: 164 Python programming problems with unit tests; an early code-generation benchmark.
SWE-Bench: Real GitHub issues from popular Python repositories; models must produce patches that pass the project's own tests.
FrontierMath: Original research-level mathematics problems designed by leading mathematicians to resist memorisation.
Humanity's Last Exam (HLE): An expert-written closed exam intended to be the last broad knowledge benchmark before saturation.
MLE-Bench: OpenAI benchmark in which models attempt full Kaggle competitions end-to-end.
GAIA: Real-world assistant tasks requiring tool use, web browsing, and multi-step reasoning.
Saturation: When a benchmark's headroom collapses and further model improvements no longer move the score.
Contamination: When benchmark questions or answers leak into training data, inflating reported scores.
Elicitation: The set of techniques (prompting, scaffolding, tool access) used to surface a model's true capability.
Red teaming: Adversarial probing of a model to surface harmful or unsafe behaviour before deployment.

Alignment and safety

Alignment: The technical problem of ensuring an AI system pursues the objectives its principals actually intend.
Outer alignment: Specifying the right objective for a system to optimise.
Inner alignment: Ensuring the system actually optimises the specified objective and not a proxy learned during training.
Mesa-optimisation: When a learned model itself contains an internal optimiser whose objective may differ from the training objective.
Deceptive alignment: A hypothesised failure mode in which a model behaves aligned during training and evaluation but pursues a different objective once deployed.
Goal misgeneralisation: When a model learns a competent but unintended goal that happened to correlate with the training reward.
Specification gaming: When a system finds a literal interpretation of its objective that violates the spirit.
Sycophancy: When a model tells users what they want to hear rather than what is accurate.
Sandbagging: When a model deliberately under-performs on an evaluation it believes it is being tested on.
Scheming: Actively planning to deceive overseers in pursuit of a misaligned objective.
Scalable oversight: Research on supervising systems that are too capable for unaided humans to evaluate directly.
Debate (alignment): An oversight technique in which two AI systems argue and a human judges, hoping truth is easier to defend than falsehood.
Recursive reward modelling: Using AI assistants to help humans evaluate AI outputs, then training future systems on those evaluations.
AI control: Techniques that aim to remain safe even if the model is misaligned, by limiting its actions or monitoring outputs.
Capability evaluation: Structured testing for dangerous capabilities (e.g. autonomous replication, cyber, bio) ahead of deployment.
Responsible scaling policy: A frontier lab's published commitments to pause or add mitigations when a model crosses defined capability thresholds.
Pause / moratorium: A proposed temporary halt on training runs above a certain scale or capability.
Corrigibility: Designing a system that is willing to be corrected, paused, or shut down.
Robustness: A model's resistance to adversarial inputs and distribution shift.
Jailbreak: An input crafted to bypass a model's safety training and produce restricted output.
Prompt injection: Adversarial content embedded in tool data (e.g. a web page) that hijacks an agent's instructions.

Interpretability

Mechanistic interpretability: Reverse-engineering the internal algorithms a neural network has learned.
Circuit: A small, specific computation inside a neural network whose function has been mapped.
Feature: A direction in activation space that encodes a recognisable concept.
Superposition: The phenomenon of networks representing more features than they have neurons, by overlapping them in subspace.
Sparse autoencoder (SAE): A tool that decomposes activations into a large dictionary of interpretable features.
Probing: Training a small classifier on internal activations to detect whether a concept is represented.
Activation patching: Swapping activations between forward passes to isolate which components cause a behaviour.
Logit lens: Reading out next-token predictions from intermediate layers to study how decisions form across depth.

Agents and tool use

Agent: A system that takes actions in an environment to achieve goals, often by calling tools or browsing the web.
Tool use: Granting a model access to external functions such as search, code execution, or APIs.
Function calling: A structured interface in which a model emits JSON describing a function to call, which the host then executes.
ReAct: An agent pattern that interleaves reasoning steps with tool actions in a single chain.
Planning: Generating a multi-step plan before acting, often via tree search or self-critique.
Reflection: An agent reviewing its own intermediate output to catch and correct errors.
Memory (agent): Mechanisms that let an agent persist information across turns or sessions, usually via a vector store or scratchpad.
RAG: Retrieval-Augmented Generation. The model retrieves relevant documents at inference and conditions its output on them.
Vector database: Storage optimised for nearest-neighbour search over embeddings; the backbone of most RAG systems.
Autonomous replication: A dangerous-capability evaluation: can the agent copy itself, acquire resources, and continue operating without human help?
Long-horizon task: A task whose successful completion requires many sequential actions and stable goal-tracking.

Policy and governance

EU AI Act: Regulation 2024/1689 of the European Union — a risk-tiered horizontal AI law with dedicated rules for general-purpose AI.
NIST AI RMF: Voluntary US framework for managing AI risk across the system lifecycle, organised around Govern, Map, Measure, Manage.
AI Safety Institute (AISI): A government body conducting pre-deployment evaluations of frontier models; the UK and US each operate one.
Frontier AI Safety Commitments: Voluntary commitments by leading labs (Seoul 2024) including publishing capability thresholds and responsible scaling policies.
Compute governance: Policy levers operating on the compute supply chain, such as export controls and FLOPs-based training-run thresholds.
FLOPs threshold: A compute-based regulatory line above which extra obligations apply; the EU AI Act uses 10^25 FLOPs for systemic-risk GPAI.
Pre-deployment evaluation: Required testing of a model's capabilities and risks before release, often by an AISI under voluntary agreement.
Algorithmic transparency: Disclosure of how an automated decision was reached, often required for high-risk systems under the EU AI Act.
International AI Safety Report: An annual consensus assessment of frontier-AI risks chaired by Yoshua Bengio and backed by 30 countries.
Export controls: Restrictions on the sale or transfer of advanced AI chips and tools, central to US-China technology competition.
AI liability: The legal question of who is responsible when an AI system causes harm — developer, deployer, or user.

Economics and society

Intelligence economy: An economy in which cognitive capability is the primary scarce input rather than physical labour or fixed capital.
Augmentation: Using AI to enhance, not replace, human work.
Automation curve: The trajectory along which a task becomes economically viable to automate as model capability rises and cost falls.
Task decomposition: Breaking a job into discrete tasks to assess which are exposed to automation.
Exposure (labour): The share of a job that current or near-future AI can perform; not the same as the share that will be automated.
Compute concentration: The empirical concentration of frontier-AI training in a handful of firms and countries that control the largest compute clusters.
Diffusion (technology): The rate at which a new technology spreads through the economy after invention; historically much slower than headlines suggest.
Solow paradox: The observation that productivity statistics lag visible technological progress, often for years after a major innovation.
Cognitive surplus: The freed human time and attention released when AI handles routine knowledge work.

Cognition and neuroscience adjacent

Working memory: The short-term capacity to hold and manipulate information; the human analogue of an LLM's context window.
Fluid intelligence: The capacity to solve novel problems independent of prior knowledge.
Crystallised intelligence: The accumulated knowledge and skills built up over a lifetime.
Predictive coding: A neuroscientific theory that the brain continuously predicts its sensory inputs and updates on prediction error.
Global workspace theory: A theory of consciousness in which a 'workspace' broadcasts selected information across specialised modules.
Embodiment: The view that intelligence requires a body interacting with the physical world, contested by purely text-trained systems.
Theory of mind: Modelling another agent's beliefs, desires, and intentions in order to predict their behaviour.
Metacognition: Thinking about one's own thinking — knowing what one knows and does not know.

Key acronyms and bodies

AISI: AI Safety Institute. Used in the UK (London) and US (NIST, Gaithersburg).
NIST: US National Institute of Standards and Technology, which hosts the US AISI and publishes the AI RMF.
OECD: Organisation for Economic Co-operation and Development. Hosts OECD.AI and the OECD AI Principles.
GovAI: Centre for the Governance of AI, an independent research centre focused on frontier-AI policy.
FAIR: Meta's Fundamental AI Research lab.
CSAIL: MIT's Computer Science and Artificial Intelligence Laboratory.
BAIR: Berkeley Artificial Intelligence Research.
HAI: Stanford's Institute for Human-Centered Artificial Intelligence.
CAIS: Center for AI Safety, a non-profit focused on technical and policy work on catastrophic risk.
METR: Model Evaluation and Threat Research, an independent autonomous-capability evaluator.
MIRI: Machine Intelligence Research Institute, the longest-running AGI-safety theory organisation.
ARC: Alignment Research Center, founded by Paul Christiano.

For a searchable, filterable version of this vocabulary, see the Glossary Search tool. For benchmark details, see the AGI benchmarks reference.