NoticeThis site demonstrates one possible use of this domain. For acquisition, partnership, or investment inquiries, please use our contact form.

Hub / 10

AI Safety, Alignment, and Governance

Making advanced AI and AGI safe is a research field, an engineering practice, and a governance project at once. This hub maps how those three layers fit together in 2026.

fig.01// field plate
A balance scale weighing an AI core against human policy and oversight guardrails with annotated arrows - risograph field plate.
fig.s3 / alignment + oversight
pillar.01

Alignment Research

Technical work to make AI systems pursue intended goals and values - scalable oversight, interpretability, honesty, and corrigibility. Active programs at Anthropic, OpenAI, Google DeepMind, MIRI, CHAI (UC Berkeley), and the Center for AI Safety.

pillar.02

AGI Safety

The harder problem of keeping highly capable, general-purpose systems aligned as they begin to act autonomously over long horizons. Formalised by Stuart Russell, Nick Bostrom, and Paul Christiano, now central to frontier-lab safety teams.

pillar.03

AI Risk

Near-term harms (bias, misinformation, cyber misuse, labour disruption) and longer-term risks (loss of human oversight, biosecurity, concentration of power). Treated by the field as a continuum, not rival camps.

pillar.04

AI Evaluation

Standardised testing of capability and risk - red teaming, dangerous-capability evals, benchmarks like MMLU, GPQA, SWE-bench, and the evaluations programs at the US AI Safety Institute (AISI) and UK AISI.

pillar.05

AI Oversight & Transparency

Mechanisms for inspecting models and deployments: model cards, system cards, audit logs, incident reporting, and pre-deployment access for safety institutes.

pillar.06

Responsible & Trustworthy AI

Operational standards covering fairness, privacy, security, and accountability. Anchored by the NIST AI Risk Management Framework (2023) and ISO/IEC 42001 (2023).

pillar.07

AI Governance

Laws, regulations, and institutions shaping how AI is built and deployed - the EU AI Act, US executive orders, the Bletchley and Seoul declarations, and the Frontier Model Forum.

pillar.08

AGI Governance

Emerging proposals for governing frontier and general-purpose AI specifically: compute thresholds, licensing, safety cases, and international coordination on advanced AI development.

// CORE_THESIS

Safety is not a feature you add at the end. For advanced AI, it is the architecture.

Every credible roadmap to safe AGI treats alignment, evaluation, and governance as co-evolving disciplines - each constraining and informing the others. Treating any one in isolation is the most common failure mode.

How the field is organised in 2026

Alignment research aims to make systems pursue intended goals. AI evaluation measures whether they actually do. AI oversight and AI transparency create the institutional machinery to act on what evaluations reveal. AI governance sets the legal and normative constraints inside which the other layers operate. AGI governance extends those constraints to general-purpose and frontier systems specifically.

The institutional landscape consolidated quickly. The EU AI Act entered into force in August 2024, with general-purpose AI obligations phasing in through 2025-2027. The UK and US AI Safety Institutes signed a partnership in April 2024 and now run joint pre-deployment evaluations. The International AI Safety Report, chaired by Turing-award winner Yoshua Bengio, became the field's reference scientific summary in 2025 and is updated annually.

Responsible AI as the deployment layer

Responsible AI and trustworthy AIare the practitioner-facing terms for operationalising all of this: fairness, privacy, security, accountability, and incident response. The NIST AI Risk Management Framework (2023) and ISO/IEC 42001 (2023) are the most widely adopted reference standards, and most major cloud and model providers now publish capability-threshold policies modelled on Anthropic's Responsible Scaling Policy.

// continue reading

Related hubs