NoticeThis site demonstrates one possible use of this domain. For acquisition, partnership, or investment inquiries, please use our contact form.

AGI Fundamentals / Glossary

The AGI glossary

153 essential terms across the modern AGI vocabulary — from core concepts and architecture to alignment, interpretability, agents, and policy.

Core concepts

The foundational vocabulary every other section depends on.

AGI (Artificial General Intelligence)
A hypothesised system that matches or exceeds competent human adults across the broad range of cognitive work needed for most economically valuable tasks.
ASI (Artificial Superintelligence)
A system whose general cognitive performance substantially exceeds the best humans across essentially all domains.
Narrow AI
A system optimised for one task family. Today's chess engines, image classifiers, and recommendation systems are narrow.
General-purpose AI (GPAI)
The EU AI Act term for models with broad capability, including most frontier LLMs.
Frontier model
The most capable systems in current deployment, usually trained with compute at or above the global frontier.
Foundation model
A large model pretrained on broad data and adapted to many downstream tasks via prompting or fine-tuning.
Capability
What a model can do under appropriate elicitation, as opposed to what it does by default.
Generality
The breadth of tasks at which a system performs competently without task-specific training.
Transfer
The ability to apply knowledge learned in one domain to a new one.
Emergence
Capabilities that appear abruptly past a scale or training threshold and were absent in smaller models.
Generalisation
Performance on inputs drawn from a distribution similar to but distinct from the training data.
Out-of-distribution
Inputs that differ substantively from anything in the training data, where model behaviour is least predictable.
Compute
Raw arithmetic resources used to train or run a model, usually measured in floating-point operations (FLOPs).
Scaling
Increasing compute, data, or parameters to improve capability, governed by predictable scaling laws.
Inference
Running a trained model to produce outputs from new inputs, as opposed to training.
Token
The unit a language model consumes and emits — usually a sub-word chunk of text.
Context window
The maximum number of tokens a model can attend to at once when producing each output token.
Parameters
The learned numerical weights of a neural network; a rough proxy for capacity.

Levels and milestones

Level 1 — Emerging
Performance comparable to an unskilled human. Most chatbots from 2023 sat here on most tasks.
Level 2 — Competent
Performance at the 50th percentile of skilled adults. Frontier systems sit here on many text tasks.
Level 3 — Expert
Performance at the 90th percentile of skilled adults.
Level 4 — Virtuoso
Performance at the 99th percentile of skilled adults.
Level 5 — Superhuman
Performance that exceeds 100% of humans across the relevant capability.
DeepMind levels
DeepMind's six-level taxonomy (No AI to Superhuman) along the dimensions of performance and generality.
AGI test
Any benchmark or criterion proposed to determine whether a system qualifies as general. Many exist; none is agreed upon.
Turing Test
A 1950 imitation-game criterion: can a human judge distinguish a machine's responses from a person's?
Coffee Test
Steve Wozniak's informal AGI criterion: a robot enters a stranger's house and makes a cup of coffee.
Whole-brain emulation
A hypothetical AGI route in which a biological brain is mapped neuron by neuron and simulated in silicon.

Architecture

Transformer
The neural-network architecture introduced in 2017 that uses self-attention as its core operation; the basis of nearly every frontier model.
Self-attention
An operation that lets every token in a sequence attend to every other token, weighted by learned similarity.
Encoder / decoder
Two transformer blocks: encoders produce contextual representations of an input; decoders generate output token by token.
Decoder-only
The transformer variant used by GPT-style models — only a decoder stack, predicting the next token given all previous tokens.
Mixture of Experts (MoE)
An architecture that routes each token to a small subset of expert sub-networks, cutting compute per token while keeping total parameters high.
State-space models (SSMs)
An alternative to attention with linear sequence cost, exemplified by Mamba.
Diffusion model
A generative model that learns to reverse a noising process; the dominant approach for image and audio generation.
Multimodal model
A model that natively handles more than one modality such as text, image, audio, or video.
Embeddings
Dense vector representations of inputs that capture semantic similarity in their geometric distance.
Quantisation
Compressing model weights from higher to lower precision (e.g. 16-bit to 4-bit) to cut memory and inference cost.
Distillation
Training a smaller model to imitate a larger one, transferring much of the capability at a fraction of the cost.
Long context
Architectures and training techniques that extend the usable context window to hundreds of thousands or millions of tokens.

Training

Pretraining
The first phase of training, in which a model learns to predict the next token across a vast corpus.
Fine-tuning
Further training on narrower data to specialise a model for a task or style.
Supervised fine-tuning (SFT)
Fine-tuning on curated input-output pairs, typically the first stage of post-training.
RLHF
Reinforcement Learning from Human Feedback. Trains a model to prefer outputs humans rank higher.
RLAIF
Reinforcement Learning from AI Feedback. The same loop, but with AI-generated preference labels.
DPO
Direct Preference Optimisation. A simpler alternative to RLHF that fits preferences directly without a separate reward model.
Reward model
A learned model that scores outputs by predicted human preference, used to provide the training signal in RLHF.
Reward hacking
When a model maximises the literal reward signal in a way that violates the intent behind it.
Constitutional AI
Anthropic's technique for training a model to critique and revise its own outputs against a written set of principles.
Instruction tuning
Training a base model on instruction-following examples so it can be prompted with natural-language tasks.
Self-play
A model trains against copies of itself, as used by AlphaGo and many reasoning systems.
Synthetic data
Training data generated by other models, increasingly important as high-quality human data becomes scarce.
Curriculum learning
Ordering training examples from easier to harder to improve final capability.
Pretraining-compute scaling law
The empirical power-law relationship between pretraining compute and downstream loss, formalised by Kaplan and Hoffmann.
Test-time compute
Computation spent at inference (e.g. tree search or extended reasoning) rather than training; the new scaling axis introduced by reasoning models.

Evaluation

How researchers measure capability. See also the dedicated benchmarks reference.

Benchmark
A fixed dataset of problems with a defined scoring rule used to compare models.
Leaderboard
A ranked public list of model scores on a benchmark, often hosted by the benchmark authors.
Holdout / test set
Examples withheld from training and used only to measure generalisation.
Pass@k
A coding-benchmark metric: the fraction of problems for which at least one of k sampled solutions passes the tests.
MMLU
Massive Multitask Language Understanding — 57 academic and professional subjects; the long-time default LLM knowledge benchmark.
MMLU-Pro
A harder, more reasoning-focused successor to MMLU.
GPQA
Graduate-level Google-proof biology, physics, and chemistry questions; resists web search.
ARC-AGI
François Chollet's grid-puzzle benchmark designed to resist memorisation and test fluid reasoning.
HumanEval
164 Python programming problems with unit tests; an early code-generation benchmark.
SWE-Bench
Real GitHub issues from popular Python repositories; models must produce patches that pass the project's own tests.
FrontierMath
Original research-level mathematics problems designed by leading mathematicians to resist memorisation.
Humanity's Last Exam (HLE)
An expert-written closed exam intended to be the last broad knowledge benchmark before saturation.
MLE-Bench
OpenAI benchmark in which models attempt full Kaggle competitions end-to-end.
GAIA
Real-world assistant tasks requiring tool use, web browsing, and multi-step reasoning.
Saturation
When a benchmark's headroom collapses and further model improvements no longer move the score.
Contamination
When benchmark questions or answers leak into training data, inflating reported scores.
Elicitation
The set of techniques (prompting, scaffolding, tool access) used to surface a model's true capability.
Red teaming
Adversarial probing of a model to surface harmful or unsafe behaviour before deployment.

Alignment and safety

Alignment
The technical problem of ensuring an AI system pursues the objectives its principals actually intend.
Outer alignment
Specifying the right objective for a system to optimise.
Inner alignment
Ensuring the system actually optimises the specified objective and not a proxy learned during training.
Mesa-optimisation
When a learned model itself contains an internal optimiser whose objective may differ from the training objective.
Deceptive alignment
A hypothesised failure mode in which a model behaves aligned during training and evaluation but pursues a different objective once deployed.
Goal misgeneralisation
When a model learns a competent but unintended goal that happened to correlate with the training reward.
Specification gaming
When a system finds a literal interpretation of its objective that violates the spirit.
Sycophancy
When a model tells users what they want to hear rather than what is accurate.
Sandbagging
When a model deliberately under-performs on an evaluation it believes it is being tested on.
Scheming
Actively planning to deceive overseers in pursuit of a misaligned objective.
Scalable oversight
Research on supervising systems that are too capable for unaided humans to evaluate directly.
Debate (alignment)
An oversight technique in which two AI systems argue and a human judges, hoping truth is easier to defend than falsehood.
Recursive reward modelling
Using AI assistants to help humans evaluate AI outputs, then training future systems on those evaluations.
AI control
Techniques that aim to remain safe even if the model is misaligned, by limiting its actions or monitoring outputs.
Capability evaluation
Structured testing for dangerous capabilities (e.g. autonomous replication, cyber, bio) ahead of deployment.
Responsible scaling policy
A frontier lab's published commitments to pause or add mitigations when a model crosses defined capability thresholds.
Pause / moratorium
A proposed temporary halt on training runs above a certain scale or capability.
Corrigibility
Designing a system that is willing to be corrected, paused, or shut down.
Robustness
A model's resistance to adversarial inputs and distribution shift.
Jailbreak
An input crafted to bypass a model's safety training and produce restricted output.
Prompt injection
Adversarial content embedded in tool data (e.g. a web page) that hijacks an agent's instructions.

Interpretability

Mechanistic interpretability
Reverse-engineering the internal algorithms a neural network has learned.
Circuit
A small, specific computation inside a neural network whose function has been mapped.
Feature
A direction in activation space that encodes a recognisable concept.
Superposition
The phenomenon of networks representing more features than they have neurons, by overlapping them in subspace.
Sparse autoencoder (SAE)
A tool that decomposes activations into a large dictionary of interpretable features.
Probing
Training a small classifier on internal activations to detect whether a concept is represented.
Activation patching
Swapping activations between forward passes to isolate which components cause a behaviour.
Logit lens
Reading out next-token predictions from intermediate layers to study how decisions form across depth.

Agents and tool use

Agent
A system that takes actions in an environment to achieve goals, often by calling tools or browsing the web.
Tool use
Granting a model access to external functions such as search, code execution, or APIs.
Function calling
A structured interface in which a model emits JSON describing a function to call, which the host then executes.
ReAct
An agent pattern that interleaves reasoning steps with tool actions in a single chain.
Planning
Generating a multi-step plan before acting, often via tree search or self-critique.
Reflection
An agent reviewing its own intermediate output to catch and correct errors.
Memory (agent)
Mechanisms that let an agent persist information across turns or sessions, usually via a vector store or scratchpad.
RAG
Retrieval-Augmented Generation. The model retrieves relevant documents at inference and conditions its output on them.
Vector database
Storage optimised for nearest-neighbour search over embeddings; the backbone of most RAG systems.
Autonomous replication
A dangerous-capability evaluation: can the agent copy itself, acquire resources, and continue operating without human help?
Long-horizon task
A task whose successful completion requires many sequential actions and stable goal-tracking.

Policy and governance

EU AI Act
Regulation 2024/1689 of the European Union — a risk-tiered horizontal AI law with dedicated rules for general-purpose AI.
NIST AI RMF
Voluntary US framework for managing AI risk across the system lifecycle, organised around Govern, Map, Measure, Manage.
AI Safety Institute (AISI)
A government body conducting pre-deployment evaluations of frontier models; the UK and US each operate one.
Frontier AI Safety Commitments
Voluntary commitments by leading labs (Seoul 2024) including publishing capability thresholds and responsible scaling policies.
Compute governance
Policy levers operating on the compute supply chain, such as export controls and FLOPs-based training-run thresholds.
FLOPs threshold
A compute-based regulatory line above which extra obligations apply; the EU AI Act uses 10^25 FLOPs for systemic-risk GPAI.
Pre-deployment evaluation
Required testing of a model's capabilities and risks before release, often by an AISI under voluntary agreement.
Algorithmic transparency
Disclosure of how an automated decision was reached, often required for high-risk systems under the EU AI Act.
International AI Safety Report
An annual consensus assessment of frontier-AI risks chaired by Yoshua Bengio and backed by 30 countries.
Export controls
Restrictions on the sale or transfer of advanced AI chips and tools, central to US-China technology competition.
AI liability
The legal question of who is responsible when an AI system causes harm — developer, deployer, or user.

Economics and society

Intelligence economy
An economy in which cognitive capability is the primary scarce input rather than physical labour or fixed capital.
Augmentation
Using AI to enhance, not replace, human work.
Automation curve
The trajectory along which a task becomes economically viable to automate as model capability rises and cost falls.
Task decomposition
Breaking a job into discrete tasks to assess which are exposed to automation.
Exposure (labour)
The share of a job that current or near-future AI can perform; not the same as the share that will be automated.
Compute concentration
The empirical concentration of frontier-AI training in a handful of firms and countries that control the largest compute clusters.
Diffusion (technology)
The rate at which a new technology spreads through the economy after invention; historically much slower than headlines suggest.
Solow paradox
The observation that productivity statistics lag visible technological progress, often for years after a major innovation.
Cognitive surplus
The freed human time and attention released when AI handles routine knowledge work.

Cognition and neuroscience adjacent

Working memory
The short-term capacity to hold and manipulate information; the human analogue of an LLM's context window.
Fluid intelligence
The capacity to solve novel problems independent of prior knowledge.
Crystallised intelligence
The accumulated knowledge and skills built up over a lifetime.
Predictive coding
A neuroscientific theory that the brain continuously predicts its sensory inputs and updates on prediction error.
Global workspace theory
A theory of consciousness in which a 'workspace' broadcasts selected information across specialised modules.
Embodiment
The view that intelligence requires a body interacting with the physical world, contested by purely text-trained systems.
Theory of mind
Modelling another agent's beliefs, desires, and intentions in order to predict their behaviour.
Metacognition
Thinking about one's own thinking — knowing what one knows and does not know.

Key acronyms and bodies

AISI
AI Safety Institute. Used in the UK (London) and US (NIST, Gaithersburg).
NIST
US National Institute of Standards and Technology, which hosts the US AISI and publishes the AI RMF.
OECD
Organisation for Economic Co-operation and Development. Hosts OECD.AI and the OECD AI Principles.
GovAI
Centre for the Governance of AI, an independent research centre focused on frontier-AI policy.
FAIR
Meta's Fundamental AI Research lab.
CSAIL
MIT's Computer Science and Artificial Intelligence Laboratory.
BAIR
Berkeley Artificial Intelligence Research.
HAI
Stanford's Institute for Human-Centered Artificial Intelligence.
CAIS
Center for AI Safety, a non-profit focused on technical and policy work on catastrophic risk.
METR
Model Evaluation and Threat Research, an independent autonomous-capability evaluator.
MIRI
Machine Intelligence Research Institute, the longest-running AGI-safety theory organisation.
ARC
Alignment Research Center, founded by Paul Christiano.

For a searchable, filterable version of this vocabulary, see the Glossary Search tool. For benchmark details, see the AGI benchmarks reference.