AGI Fundamentals / Glossary
The AGI glossary
153 essential terms across the modern AGI vocabulary — from core concepts and architecture to alignment, interpretability, agents, and policy.
Core concepts
The foundational vocabulary every other section depends on.
- AGI (Artificial General Intelligence)
- A hypothesised system that matches or exceeds competent human adults across the broad range of cognitive work needed for most economically valuable tasks.
- ASI (Artificial Superintelligence)
- A system whose general cognitive performance substantially exceeds the best humans across essentially all domains.
- Narrow AI
- A system optimised for one task family. Today's chess engines, image classifiers, and recommendation systems are narrow.
- General-purpose AI (GPAI)
- The EU AI Act term for models with broad capability, including most frontier LLMs.
- Frontier model
- The most capable systems in current deployment, usually trained with compute at or above the global frontier.
- Foundation model
- A large model pretrained on broad data and adapted to many downstream tasks via prompting or fine-tuning.
- Capability
- What a model can do under appropriate elicitation, as opposed to what it does by default.
- Generality
- The breadth of tasks at which a system performs competently without task-specific training.
- Transfer
- The ability to apply knowledge learned in one domain to a new one.
- Emergence
- Capabilities that appear abruptly past a scale or training threshold and were absent in smaller models.
- Generalisation
- Performance on inputs drawn from a distribution similar to but distinct from the training data.
- Out-of-distribution
- Inputs that differ substantively from anything in the training data, where model behaviour is least predictable.
- Compute
- Raw arithmetic resources used to train or run a model, usually measured in floating-point operations (FLOPs).
- Scaling
- Increasing compute, data, or parameters to improve capability, governed by predictable scaling laws.
- Inference
- Running a trained model to produce outputs from new inputs, as opposed to training.
- Token
- The unit a language model consumes and emits — usually a sub-word chunk of text.
- Context window
- The maximum number of tokens a model can attend to at once when producing each output token.
- Parameters
- The learned numerical weights of a neural network; a rough proxy for capacity.
Levels and milestones
- Level 1 — Emerging
- Performance comparable to an unskilled human. Most chatbots from 2023 sat here on most tasks.
- Level 2 — Competent
- Performance at the 50th percentile of skilled adults. Frontier systems sit here on many text tasks.
- Level 3 — Expert
- Performance at the 90th percentile of skilled adults.
- Level 4 — Virtuoso
- Performance at the 99th percentile of skilled adults.
- Level 5 — Superhuman
- Performance that exceeds 100% of humans across the relevant capability.
- DeepMind levels
- DeepMind's six-level taxonomy (No AI to Superhuman) along the dimensions of performance and generality.
- AGI test
- Any benchmark or criterion proposed to determine whether a system qualifies as general. Many exist; none is agreed upon.
- Turing Test
- A 1950 imitation-game criterion: can a human judge distinguish a machine's responses from a person's?
- Coffee Test
- Steve Wozniak's informal AGI criterion: a robot enters a stranger's house and makes a cup of coffee.
- Whole-brain emulation
- A hypothetical AGI route in which a biological brain is mapped neuron by neuron and simulated in silicon.
Architecture
- Transformer
- The neural-network architecture introduced in 2017 that uses self-attention as its core operation; the basis of nearly every frontier model.
- Self-attention
- An operation that lets every token in a sequence attend to every other token, weighted by learned similarity.
- Encoder / decoder
- Two transformer blocks: encoders produce contextual representations of an input; decoders generate output token by token.
- Decoder-only
- The transformer variant used by GPT-style models — only a decoder stack, predicting the next token given all previous tokens.
- Mixture of Experts (MoE)
- An architecture that routes each token to a small subset of expert sub-networks, cutting compute per token while keeping total parameters high.
- State-space models (SSMs)
- An alternative to attention with linear sequence cost, exemplified by Mamba.
- Diffusion model
- A generative model that learns to reverse a noising process; the dominant approach for image and audio generation.
- Multimodal model
- A model that natively handles more than one modality such as text, image, audio, or video.
- Embeddings
- Dense vector representations of inputs that capture semantic similarity in their geometric distance.
- Quantisation
- Compressing model weights from higher to lower precision (e.g. 16-bit to 4-bit) to cut memory and inference cost.
- Distillation
- Training a smaller model to imitate a larger one, transferring much of the capability at a fraction of the cost.
- Long context
- Architectures and training techniques that extend the usable context window to hundreds of thousands or millions of tokens.
Training
- Pretraining
- The first phase of training, in which a model learns to predict the next token across a vast corpus.
- Fine-tuning
- Further training on narrower data to specialise a model for a task or style.
- Supervised fine-tuning (SFT)
- Fine-tuning on curated input-output pairs, typically the first stage of post-training.
- RLHF
- Reinforcement Learning from Human Feedback. Trains a model to prefer outputs humans rank higher.
- RLAIF
- Reinforcement Learning from AI Feedback. The same loop, but with AI-generated preference labels.
- DPO
- Direct Preference Optimisation. A simpler alternative to RLHF that fits preferences directly without a separate reward model.
- Reward model
- A learned model that scores outputs by predicted human preference, used to provide the training signal in RLHF.
- Reward hacking
- When a model maximises the literal reward signal in a way that violates the intent behind it.
- Constitutional AI
- Anthropic's technique for training a model to critique and revise its own outputs against a written set of principles.
- Instruction tuning
- Training a base model on instruction-following examples so it can be prompted with natural-language tasks.
- Self-play
- A model trains against copies of itself, as used by AlphaGo and many reasoning systems.
- Synthetic data
- Training data generated by other models, increasingly important as high-quality human data becomes scarce.
- Curriculum learning
- Ordering training examples from easier to harder to improve final capability.
- Pretraining-compute scaling law
- The empirical power-law relationship between pretraining compute and downstream loss, formalised by Kaplan and Hoffmann.
- Test-time compute
- Computation spent at inference (e.g. tree search or extended reasoning) rather than training; the new scaling axis introduced by reasoning models.
Evaluation
How researchers measure capability. See also the dedicated benchmarks reference.
- Benchmark
- A fixed dataset of problems with a defined scoring rule used to compare models.
- Leaderboard
- A ranked public list of model scores on a benchmark, often hosted by the benchmark authors.
- Holdout / test set
- Examples withheld from training and used only to measure generalisation.
- Pass@k
- A coding-benchmark metric: the fraction of problems for which at least one of k sampled solutions passes the tests.
- MMLU
- Massive Multitask Language Understanding — 57 academic and professional subjects; the long-time default LLM knowledge benchmark.
- MMLU-Pro
- A harder, more reasoning-focused successor to MMLU.
- GPQA
- Graduate-level Google-proof biology, physics, and chemistry questions; resists web search.
- ARC-AGI
- François Chollet's grid-puzzle benchmark designed to resist memorisation and test fluid reasoning.
- HumanEval
- 164 Python programming problems with unit tests; an early code-generation benchmark.
- SWE-Bench
- Real GitHub issues from popular Python repositories; models must produce patches that pass the project's own tests.
- FrontierMath
- Original research-level mathematics problems designed by leading mathematicians to resist memorisation.
- Humanity's Last Exam (HLE)
- An expert-written closed exam intended to be the last broad knowledge benchmark before saturation.
- MLE-Bench
- OpenAI benchmark in which models attempt full Kaggle competitions end-to-end.
- GAIA
- Real-world assistant tasks requiring tool use, web browsing, and multi-step reasoning.
- Saturation
- When a benchmark's headroom collapses and further model improvements no longer move the score.
- Contamination
- When benchmark questions or answers leak into training data, inflating reported scores.
- Elicitation
- The set of techniques (prompting, scaffolding, tool access) used to surface a model's true capability.
- Red teaming
- Adversarial probing of a model to surface harmful or unsafe behaviour before deployment.
Alignment and safety
- Alignment
- The technical problem of ensuring an AI system pursues the objectives its principals actually intend.
- Outer alignment
- Specifying the right objective for a system to optimise.
- Inner alignment
- Ensuring the system actually optimises the specified objective and not a proxy learned during training.
- Mesa-optimisation
- When a learned model itself contains an internal optimiser whose objective may differ from the training objective.
- Deceptive alignment
- A hypothesised failure mode in which a model behaves aligned during training and evaluation but pursues a different objective once deployed.
- Goal misgeneralisation
- When a model learns a competent but unintended goal that happened to correlate with the training reward.
- Specification gaming
- When a system finds a literal interpretation of its objective that violates the spirit.
- Sycophancy
- When a model tells users what they want to hear rather than what is accurate.
- Sandbagging
- When a model deliberately under-performs on an evaluation it believes it is being tested on.
- Scheming
- Actively planning to deceive overseers in pursuit of a misaligned objective.
- Scalable oversight
- Research on supervising systems that are too capable for unaided humans to evaluate directly.
- Debate (alignment)
- An oversight technique in which two AI systems argue and a human judges, hoping truth is easier to defend than falsehood.
- Recursive reward modelling
- Using AI assistants to help humans evaluate AI outputs, then training future systems on those evaluations.
- AI control
- Techniques that aim to remain safe even if the model is misaligned, by limiting its actions or monitoring outputs.
- Capability evaluation
- Structured testing for dangerous capabilities (e.g. autonomous replication, cyber, bio) ahead of deployment.
- Responsible scaling policy
- A frontier lab's published commitments to pause or add mitigations when a model crosses defined capability thresholds.
- Pause / moratorium
- A proposed temporary halt on training runs above a certain scale or capability.
- Corrigibility
- Designing a system that is willing to be corrected, paused, or shut down.
- Robustness
- A model's resistance to adversarial inputs and distribution shift.
- Jailbreak
- An input crafted to bypass a model's safety training and produce restricted output.
- Prompt injection
- Adversarial content embedded in tool data (e.g. a web page) that hijacks an agent's instructions.
Interpretability
- Mechanistic interpretability
- Reverse-engineering the internal algorithms a neural network has learned.
- Circuit
- A small, specific computation inside a neural network whose function has been mapped.
- Feature
- A direction in activation space that encodes a recognisable concept.
- Superposition
- The phenomenon of networks representing more features than they have neurons, by overlapping them in subspace.
- Sparse autoencoder (SAE)
- A tool that decomposes activations into a large dictionary of interpretable features.
- Probing
- Training a small classifier on internal activations to detect whether a concept is represented.
- Activation patching
- Swapping activations between forward passes to isolate which components cause a behaviour.
- Logit lens
- Reading out next-token predictions from intermediate layers to study how decisions form across depth.
Agents and tool use
- Agent
- A system that takes actions in an environment to achieve goals, often by calling tools or browsing the web.
- Tool use
- Granting a model access to external functions such as search, code execution, or APIs.
- Function calling
- A structured interface in which a model emits JSON describing a function to call, which the host then executes.
- ReAct
- An agent pattern that interleaves reasoning steps with tool actions in a single chain.
- Planning
- Generating a multi-step plan before acting, often via tree search or self-critique.
- Reflection
- An agent reviewing its own intermediate output to catch and correct errors.
- Memory (agent)
- Mechanisms that let an agent persist information across turns or sessions, usually via a vector store or scratchpad.
- RAG
- Retrieval-Augmented Generation. The model retrieves relevant documents at inference and conditions its output on them.
- Vector database
- Storage optimised for nearest-neighbour search over embeddings; the backbone of most RAG systems.
- Autonomous replication
- A dangerous-capability evaluation: can the agent copy itself, acquire resources, and continue operating without human help?
- Long-horizon task
- A task whose successful completion requires many sequential actions and stable goal-tracking.
Policy and governance
- EU AI Act
- Regulation 2024/1689 of the European Union — a risk-tiered horizontal AI law with dedicated rules for general-purpose AI.
- NIST AI RMF
- Voluntary US framework for managing AI risk across the system lifecycle, organised around Govern, Map, Measure, Manage.
- AI Safety Institute (AISI)
- A government body conducting pre-deployment evaluations of frontier models; the UK and US each operate one.
- Frontier AI Safety Commitments
- Voluntary commitments by leading labs (Seoul 2024) including publishing capability thresholds and responsible scaling policies.
- Compute governance
- Policy levers operating on the compute supply chain, such as export controls and FLOPs-based training-run thresholds.
- FLOPs threshold
- A compute-based regulatory line above which extra obligations apply; the EU AI Act uses 10^25 FLOPs for systemic-risk GPAI.
- Pre-deployment evaluation
- Required testing of a model's capabilities and risks before release, often by an AISI under voluntary agreement.
- Algorithmic transparency
- Disclosure of how an automated decision was reached, often required for high-risk systems under the EU AI Act.
- International AI Safety Report
- An annual consensus assessment of frontier-AI risks chaired by Yoshua Bengio and backed by 30 countries.
- Export controls
- Restrictions on the sale or transfer of advanced AI chips and tools, central to US-China technology competition.
- AI liability
- The legal question of who is responsible when an AI system causes harm — developer, deployer, or user.
Economics and society
- Intelligence economy
- An economy in which cognitive capability is the primary scarce input rather than physical labour or fixed capital.
- Augmentation
- Using AI to enhance, not replace, human work.
- Automation curve
- The trajectory along which a task becomes economically viable to automate as model capability rises and cost falls.
- Task decomposition
- Breaking a job into discrete tasks to assess which are exposed to automation.
- Exposure (labour)
- The share of a job that current or near-future AI can perform; not the same as the share that will be automated.
- Compute concentration
- The empirical concentration of frontier-AI training in a handful of firms and countries that control the largest compute clusters.
- Diffusion (technology)
- The rate at which a new technology spreads through the economy after invention; historically much slower than headlines suggest.
- Solow paradox
- The observation that productivity statistics lag visible technological progress, often for years after a major innovation.
- Cognitive surplus
- The freed human time and attention released when AI handles routine knowledge work.
Cognition and neuroscience adjacent
- Working memory
- The short-term capacity to hold and manipulate information; the human analogue of an LLM's context window.
- Fluid intelligence
- The capacity to solve novel problems independent of prior knowledge.
- Crystallised intelligence
- The accumulated knowledge and skills built up over a lifetime.
- Predictive coding
- A neuroscientific theory that the brain continuously predicts its sensory inputs and updates on prediction error.
- Global workspace theory
- A theory of consciousness in which a 'workspace' broadcasts selected information across specialised modules.
- Embodiment
- The view that intelligence requires a body interacting with the physical world, contested by purely text-trained systems.
- Theory of mind
- Modelling another agent's beliefs, desires, and intentions in order to predict their behaviour.
- Metacognition
- Thinking about one's own thinking — knowing what one knows and does not know.
Key acronyms and bodies
- AISI
- AI Safety Institute. Used in the UK (London) and US (NIST, Gaithersburg).
- NIST
- US National Institute of Standards and Technology, which hosts the US AISI and publishes the AI RMF.
- OECD
- Organisation for Economic Co-operation and Development. Hosts OECD.AI and the OECD AI Principles.
- GovAI
- Centre for the Governance of AI, an independent research centre focused on frontier-AI policy.
- FAIR
- Meta's Fundamental AI Research lab.
- CSAIL
- MIT's Computer Science and Artificial Intelligence Laboratory.
- BAIR
- Berkeley Artificial Intelligence Research.
- HAI
- Stanford's Institute for Human-Centered Artificial Intelligence.
- CAIS
- Center for AI Safety, a non-profit focused on technical and policy work on catastrophic risk.
- METR
- Model Evaluation and Threat Research, an independent autonomous-capability evaluator.
- MIRI
- Machine Intelligence Research Institute, the longest-running AGI-safety theory organisation.
- ARC
- Alignment Research Center, founded by Paul Christiano.
For a searchable, filterable version of this vocabulary, see the Glossary Search tool. For benchmark details, see the AGI benchmarks reference.