Research / Safety

Safety & alignment organisations

Independent labs and state-run institutes that evaluate frontier models and research how to align them.

AI safety research splits roughly into three camps: theoretical alignment (how would we make a superhuman system reliably do what we want?), empirical alignment (what works on today's frontier models?), and evaluation (how dangerous are current systems, really?). The organisations below span all three. Several are now embedded in national-security infrastructure through formal evaluation agreements with frontier labs.

01
Machine Intelligence Research Institute (MIRI)
2000 · Berkeley, USA
The longest-running AGI-safety research organisation. Pioneered formal work on decision theory, corrigibility, and misalignment risk.
Agent foundationsTheory
Visit site
02
Alignment Research Center (ARC)
2021 · Berkeley, USA
Theory and evaluations group founded by former OpenAI alignment lead Paul Christiano. Spawned the dangerous-capability evaluation organisation METR.
Eliciting Latent KnowledgeEvals
Visit site
03
METR
2023 · Berkeley, USA
Model Evaluation and Threat Research. Conducts independent autonomous-capability evaluations of frontier models, including for OpenAI, Anthropic, and the US AISI.
Autonomy evalsPre-deployment
Visit site
04
Apollo Research
2023 · London, UK
Specialises in evaluating frontier models for deceptive and scheming behaviour, and in publishing case studies used by safety institutes worldwide.
DeceptionScheming evals
Visit site
05
Redwood Research
2021 · Berkeley, USA
Empirical alignment lab focused on AI control: techniques that work even if models are misaligned. Co-publishes prominent work with Anthropic.
ControlAdversarial training
Visit site
06
UK AI Security Institute (AISI)
2023 · London, UK
Government institute conducting pre-deployment evaluations of frontier models on behalf of the United Kingdom. The first state-run frontier-model evaluator.
National evalsStandards
Visit site
07
US AI Safety Institute (US AISI)
2024 · Gaithersburg, USA
Housed at NIST. Develops US technical standards for frontier-model evaluation and red-teaming, including formal evaluation agreements with OpenAI and Anthropic.
NISTEvalsRed teaming
Visit site
08
Center for AI Safety (CAIS)
2022 · San Francisco, USA
Non-profit that produced the widely signed 2023 statement on extinction risk from AI. Runs technical research, the SafeBench benchmark, and policy outreach.
Field buildingRisk research
Visit site
09
FAR AI
2022 · Berkeley, USA
Independent research non-profit that incubates new alignment research agendas and convenes the Alignment Workshop series.
Adversarial robustnessIncubation
Visit site
10
Conjecture
2022 · London, UK
Lab pursuing 'cognitive emulation' as a safer alternative to opaque end-to-end systems, alongside vocal policy advocacy.
Cognitive emulationGovernance
Visit site

How to use this list: follow at least one independent evaluator (METR or Apollo) and one government institute (UK AISI or US AISI) to triangulate official capability claims against external red-teaming.

Safety & alignment organisations

Machine Intelligence Research Institute (MIRI)

Alignment Research Center (ARC)

METR

Apollo Research

Redwood Research

UK AI Security Institute (AISI)

US AI Safety Institute (US AISI)

Center for AI Safety (CAIS)

FAR AI

Conjecture