Safety & alignment organisations
Independent labs and state-run institutes that evaluate frontier models and research how to align them.
AI safety research splits roughly into three camps: theoretical alignment (how would we make a superhuman system reliably do what we want?), empirical alignment (what works on today's frontier models?), and evaluation (how dangerous are current systems, really?). The organisations below span all three. Several are now embedded in national-security infrastructure through formal evaluation agreements with frontier labs.
- 01
Machine Intelligence Research Institute (MIRI)
2000 · Berkeley, USAThe longest-running AGI-safety research organisation. Pioneered formal work on decision theory, corrigibility, and misalignment risk.
Agent foundationsTheory - 02
Alignment Research Center (ARC)
2021 · Berkeley, USATheory and evaluations group founded by former OpenAI alignment lead Paul Christiano. Spawned the dangerous-capability evaluation organisation METR.
Eliciting Latent KnowledgeEvals - 03
METR
2023 · Berkeley, USAModel Evaluation and Threat Research. Conducts independent autonomous-capability evaluations of frontier models, including for OpenAI, Anthropic, and the US AISI.
Autonomy evalsPre-deployment - 04
Apollo Research
2023 · London, UKSpecialises in evaluating frontier models for deceptive and scheming behaviour, and in publishing case studies used by safety institutes worldwide.
DeceptionScheming evals - 05
Redwood Research
2021 · Berkeley, USAEmpirical alignment lab focused on AI control: techniques that work even if models are misaligned. Co-publishes prominent work with Anthropic.
ControlAdversarial training - 06
UK AI Security Institute (AISI)
2023 · London, UKGovernment institute conducting pre-deployment evaluations of frontier models on behalf of the United Kingdom. The first state-run frontier-model evaluator.
National evalsStandards - 07
US AI Safety Institute (US AISI)
2024 · Gaithersburg, USAHoused at NIST. Develops US technical standards for frontier-model evaluation and red-teaming, including formal evaluation agreements with OpenAI and Anthropic.
NISTEvalsRed teaming - 08
Center for AI Safety (CAIS)
2022 · San Francisco, USANon-profit that produced the widely signed 2023 statement on extinction risk from AI. Runs technical research, the SafeBench benchmark, and policy outreach.
Field buildingRisk research - 09
FAR AI
2022 · Berkeley, USAIndependent research non-profit that incubates new alignment research agendas and convenes the Alignment Workshop series.
Adversarial robustnessIncubation - 10
Conjecture
2022 · London, UKLab pursuing 'cognitive emulation' as a safer alternative to opaque end-to-end systems, alongside vocal policy advocacy.
Cognitive emulationGovernance
How to use this list: follow at least one independent evaluator (METR or Apollo) and one government institute (UK AISI or US AISI) to triangulate official capability claims against external red-teaming.