NoticeThis site demonstrates one possible use of this domain. For acquisition, partnership, or investment inquiries, please use our contact form.

Research / Papers

Landmark AGI papers

The twenty papers, reports, and regulations whose ideas you can find inside almost every frontier model and AI policy today.

This list is opinionated. It privileges papers whose ideas became infrastructure: architectures every lab now uses, training techniques that turned research demos into products, and policy frameworks that govern how the field is allowed to operate. Entries are roughly chronological within their theme.

  1. 01

    Attention Is All You Need

    Vaswani et al. · NeurIPS, 2017

    Summary. Introduces the Transformer architecture, replacing recurrence with self-attention.

    Why it matters. The architectural foundation of every modern frontier model.

  2. 02

    Language Models are Few-Shot Learners (GPT-3)

    Brown et al. · NeurIPS, 2020

    Summary. Demonstrates that scaling a Transformer language model unlocks broad few-shot capability.

    Why it matters. Defined the modern LLM paradigm and ignited the scaling race.

  3. 03

    Scaling Laws for Neural Language Models

    Kaplan et al. · arXiv, 2020

    Summary. Empirical power-law relationships between loss, model size, dataset size, and compute.

    Why it matters. Formalised the predictability of scaling and shaped frontier training plans.

  4. 04

    Training Compute-Optimal Large Language Models (Chinchilla)

    Hoffmann et al. · DeepMind, 2022

    Summary. Shows most large models had been undertrained on data; rebalances compute toward more tokens.

    Why it matters. Reset the compute-optimal balance every modern training run uses.

  5. 05

    Highly accurate protein structure prediction with AlphaFold

    Jumper et al. · Nature, 2021

    Summary. Solves the 50-year-old protein folding problem to near-experimental accuracy.

    Why it matters. First domain in which an AI system became the reference instrument for science.

  6. 06

    Training language models to follow instructions with human feedback (InstructGPT)

    Ouyang et al. · NeurIPS, 2022

    Summary. Applies RLHF to align a language model with human instructions and preferences.

    Why it matters. The technique that made ChatGPT, Claude, and Gemini usable products.

  7. 07

    Constitutional AI: Harmlessness from AI Feedback

    Bai et al. · Anthropic, 2022

    Summary. Trains a model to critique and revise its own outputs against a written constitution.

    Why it matters. Founding methodology behind Claude and a major thread in scalable oversight.

  8. 08

    Sparks of Artificial General Intelligence

    Bubeck et al. · Microsoft Research, 2023

    Summary. Early empirical study of GPT-4 arguing it shows fragments of general intelligence.

    Why it matters. Reframed the public debate about how close current systems are to AGI.

  9. 09

    Emergent Abilities of Large Language Models

    Wei et al. · TMLR, 2022

    Summary. Catalogues capabilities that appear abruptly past a scale threshold.

    Why it matters. Crystallised the discussion of emergence and unpredictability in LLMs.

  10. 10

    Chain-of-Thought Prompting Elicits Reasoning

    Wei et al. · NeurIPS, 2022

    Summary. Shows that prompting models to think step-by-step dramatically improves reasoning.

    Why it matters. Set the stage for explicit reasoning models such as o1 and o3.

  11. 11

    Toy Models of Superposition

    Elhage et al. · Anthropic, 2022

    Summary. Explains how neural networks pack more features than they have dimensions.

    Why it matters. Foundational reading for mechanistic interpretability.

  12. 12

    Scaling Monosemanticity

    Templeton et al. · Anthropic, 2024

    Summary. Extracts millions of interpretable features from Claude using sparse autoencoders.

    Why it matters. Showed mechanistic interpretability can scale to frontier production models.

  13. 13

    Discovering Language Model Behaviors with Model-Written Evaluations

    Perez et al. · Anthropic, 2023

    Summary. Uses models to generate large-scale behavioural evaluations of other models.

    Why it matters. Templated the modern model-eval pipeline.

  14. 14

    GPT-4 Technical Report

    OpenAI · OpenAI, 2023

    Summary. System card and capability summary for the model that defined the frontier in 2023–24.

    Why it matters. The reference document for the first widely deployed multimodal frontier model.

  15. 15

    Gemini: A Family of Highly Capable Multimodal Models

    Google DeepMind · DeepMind, 2023

    Summary. Introduces Google's natively multimodal model family.

    Why it matters. Marked Google's unified post-merger frontier offering.

  16. 16

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Touvron et al. · Meta AI, 2023

    Summary. Open-weight 7B–70B Llama-2 release with safety training documentation.

    Why it matters. Anchored the modern open-weights ecosystem.

  17. 17

    AlphaProof and AlphaGeometry 2

    Google DeepMind · DeepMind, 2024

    Summary. Systems that solve International Mathematical Olympiad problems at silver-medal level.

    Why it matters. Concrete evidence of frontier mathematical reasoning by AI.

  18. 18

    International AI Safety Report 2025

    Bengio et al. · UK Government, 2025

    Summary. First annual consensus report on advanced AI risks, chaired by Yoshua Bengio and backed by 30 countries.

    Why it matters. The closest thing the field has to an IPCC-style assessment.

  19. 19

    NIST AI Risk Management Framework (AI RMF 1.0)

    NIST · NIST, 2023

    Summary. Voluntary US framework for managing AI risks across the system lifecycle.

    Why it matters. The reference governance framework most US enterprises follow.

  20. 20

    EU AI Act (Regulation 2024/1689)

    European Parliament & Council · EUR-Lex, 2024

    Summary. Risk-tiered regulation of AI systems, with dedicated rules for general-purpose AI models.

    Why it matters. The first comprehensive horizontal AI law from a major jurisdiction.

How to use this list: read the abstracts of the first ten to understand modern AI capabilities; read the last three (International AI Safety Report, NIST AI RMF, EU AI Act) to understand the rules.