Skip to main navigation Skip to search Skip to main content

A benchmark of expert-level academic questions to assess AI capabilities

  • Center for AI Safety
  • , Scale AI
  • , HLE Contributors Consortium
  • Center for AI Safety
  • Scale AI, Inc.
  • University of Zurich
  • University of Oxford
  • The University of Sheffield
  • University of Illinois at Urbana-Champaign
  • Northwestern University
  • The Ohio State University
  • Complexity Science Hub Vienna
  • Duke University
  • Dell, Inc.
  • University of Wisconsin-Madison
  • Princeton University
  • University of Washington
  • Nanyang Technological University
  • University of Southern California
  • Stanford University
  • University of Maryland, College Park
  • Harvard University
  • Google Inc.
  • Indiana University Bloomington
  • ENAC-IIC-GEL
  • Korea Advanced Institute of Science and Technology
  • McGill University
  • University of Illinois at Chicago
  • Weizmann Institute of Science Israel
  • University of Jyväskylä
  • Carnegie Mellon University
  • Providence College
  • Austrian Institute of Technology
  • Technical University of Munich
  • California Institute of Technology
  • University of Michigan, Ann Arbor
  • Cardiovascular and Vascular Surgery Training and Research Hospital
  • Ankara University
  • Long Beach VA and University of California
  • Technion - Israel Institute of Technology
  • University of Melbourne
  • University of Tübingen
  • Columbia University
  • Université de Montréal
  • Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
  • University of Chicago
  • New York University
  • Brighton Law School
  • University of North Carolina at Chapel Hill
  • University of Toronto
  • Pompeu Fabra University (UPF)
  • Trinity School
  • University of California, Berkeley
  • University of California, Davis
  • Purdue University
  • Queen's University
  • Bilkent University
  • Universidad del Valle, Cali
  • Goethe University Frankfurt am Main
  • ETH Zurich
  • University of Edinburgh
  • InxiteOut
  • Ipatimup Diagnósticos
  • EleutherAI
  • Max Planck Institute for Security and Privacy
  • University of Leiden
  • University of Manchester
  • Alpen-Adria-Universität Klagenfurt
  • T-Systems International GmbH
  • Universiteit Utrecht
  • Johns Hopkins University
  • Gakugei Shuppan-sha
  • Imperial College London
  • Emory University
  • Cornell University
  • Novo Nordisk A/S
  • New Jersey Institute of Technology
  • ADIA Lab
  • Two Minute Papers
  • Indraprastha Institute of Information Technology, Delhi
  • Northern Illinois University
  • Aleph Alpha GmbH
  • Baylor College of Medicine
  • Korea University of Technology and Education
  • Morgridge Institute for Research
  • Vienna University of Technology
  • National University of Singapore
  • University of Auckland
  • Massachusetts Institute of Technology
  • University of Seoul
  • University of Copenhagen
  • University of California, Santa Cruz
  • Amazon Machine Learning Solutions Lab
  • College of Computing
  • University of California, San Diego
  • IRBLleida-University of Lleida
  • Scripps Research
  • Dalhousie University
  • University of Lausanne
  • The Open University
  • Mayo Clinic
  • RBC Borealis
  • Universitat Politècnica de València
  • IBM Research
  • Aalborg University
  • Minerva University
  • Jagiellonian University
  • ICS/University of Groningen
  • Konkuk University
  • University of Pennsylvania
  • Microsoft Corporation
  • SUMM AI
  • University of California, Santa Barbara
  • The University of Western Australia
  • University of Kaiserslautern
  • University of Bern
  • Max Planck Institute for Intelligent Systems
  • University of Sydney
  • Westmead Hospital
  • Hexworks
  • National Aerospace University ‘Kharkiv Aviation Institute’
  • Adobe Systems
  • Saxion University of Applied Sciences
  • Universidad Nacional de Educación a Distancia
  • University College London
  • The Alan Turing Institute
  • University of British Columbia
  • Instituto de Engenharia de Sistemas e Computadores, Lisbon
  • Boston University
  • Leonardo S.p.A
  • Atilim University
  • George Mason University
  • Delft University of Technology
  • National Information Processing Institute
  • Clinical Investigation Center
  • University of Minnesota Twin Cities
  • VIT University
  • University of Tokyo
  • Precision Neurology Program & APDA Center for Advanced Parkinson Research
  • Fondazione Bruno Kessler
  • University of Maribor
  • University of Waterloo
  • Perimeter Institute for Theoretical Physics
  • Williams College
  • University of Delaware
  • University of California, Los Angeles
  • Murdoch University
  • The University of Texas at Austin
  • OpenAI, Inc.
  • Cairo University
  • IDA Business Park
  • Manipal University Jaipur
  • University of Bologna
  • Instituto Politécnico Nacional
  • Menoufia University
  • Happy Technologies LLC
  • Aligarh Muslim University
  • North Carolina State University
  • Chulalongkorn University
  • Università di Trento
  • Arizona State University
  • DFKI GmbH
  • RMIT University
  • Standard Intelligence
  • Research Centre Julich
  • Sheffield Teaching Hospitals NHS Foundation Trust
  • Genomia Diagnostics Research
  • CSMSS Chh. Shahu College of Engineering
  • Quotient AI
  • Missouri University of Science and Technology
  • Patched Codes
  • University of the Fraser Valley
  • Central Mindanao University
  • Univ. of Yaoundé
  • Bethune-Cookman University
  • LGM
  • University of São Paulo
  • Université Mohamed Premier
  • Hemvati Nandan Bahuguna Garhwal University
  • University Malaya
  • Tanta University
  • Royal Holloway University of London
  • Leibniz Institute for Science and Mathematics Education
  • Indian Institute of Technology Delhi
  • Pondicherry Engineering College
  • John Crane UK Ltd.
  • CNRS IRL-IFAECI
  • Universidad Tecnológica Nacional
  • Intelligent Geometries
  • The Hebrew University of Jerusalem
  • Pennsylvania College of Technology
  • Chi Minh City University of Technology
  • Australian National University
  • Universität des Saarlandes
  • AE Studio
  • Rutgers University–New Brunswick
  • Anthropic
  • Google DeepMind
  • HUN-REN KOZPONT
  • Rice University
  • Instituto Gonçalo Moniz
  • James Madison University
  • University of Cambridge
  • Dartmouth College
  • University of Virginia
  • Nottingham Trent University
  • Washington University in St. Louis
  • University of Heidelberg
  • Central College in Pella
  • OncoPrecision
  • University of Hertfordshire
  • University of Alabama in Huntsville
  • University of Warwick
  • Bournemouth University
  • University of Oklahoma
  • Florida Atlantic University
  • University of Arkansas
  • Jala University
  • University of Bristol
  • Hospital for Sick Children University of Toronto
  • Mansoura University
  • Bogazici University
  • Beni-Suef University
  • University of Bradford
  • CICESE Research Center
  • Women and Infants Hospital of Rhode Island-Warren Alpert Medical School of Brown University
  • Eastlake High School
  • The Future Paralegals of America
  • University of Mannheim
  • Children’s Hospital of Orange County
  • Humboldt-Universität zu Berlin
  • University of Rome
  • University of Valencia
  • Unidade Local de Saúde de Lisboa Ocidental
  • Durham University
  • KTH Royal Institute of Technology
  • Chalmers University of Technology
  • Nabu Technologies
  • University of Innsbruck
  • La Trobe University
  • Modulo Research Ltd
  • University of Granada
  • Max Planck Institute for Software Systems
  • New School for Social Research
  • University of Amsterdam
  • Politecnico di Milano
  • Sorbonne Université
  • Laboratoire de Probabilités et Modèles Aléatoires
  • Universidad de Morón
  • Cispa Helmholtz Center for Information Security
  • Universität Hamburg
  • Czech Technical University in Prague
  • Centre national de la recherche scientifique
  • Ecole Normale Supérieure de Lyon
  • Eastern Institute of Technology
  • Cranfield University
  • Abacus.AI
  • University 'Politehnica' of Bucharest
  • University of Vienna
  • The Hartree Centre
  • University of Texas at Arlington
  • AIM Intelligence, Inc.
  • Seoul National University
  • IIT BHU
  • Cisco Systems
  • Saint Mary's University
  • Temple University
  • Dyno Therapeutics, Inc.
  • CTTC/CERCA
  • Intuit Inc
  • University of Guelph
  • Gakushuin University
  • University of Mumbai
  • Drexel University
  • University of Oregon
  • Polytechnic University of the Philippines
  • Georgia State University
  • University of Pisa
  • TU Berlin
  • Materials Platform for Data Science
  • HomeEquity Bank
  • Fraunhofer IMTE
  • University of Montpellier (UMR MiVEGEC)
  • Gray Swan AI
  • University of Tennessee
  • CERo Therapeutics Holdings
  • SAMPE Switzerland
  • Universidad de Buenos Aires
  • King Saud University
  • INRIA Institut National de Recherche en Informatique et en Automatique
  • Ivy Natal
  • Intrinsic Innovation LLC
  • COLLEGE OF EASTERN IDAHO
  • University of North Texas
  • Stockholm University
  • Alexandru Ioan Cuza University
  • California Polytechnic State University, San Luis Obispo
  • University of Geneva
  • Virginia Polytechnic Institute and State University
  • University of Massachusetts-Lowell
  • University of Milano-Bicocca
  • Canadian University Dubai
  • University of Texas
  • Larkin Community Hospital
  • Leibniz Universität Hannover
  • Van Andel Institute
  • Monash University
  • Universite de Montreal
  • SDAIA
  • Instituto Superior Técnico
  • University of London
  • University of Padova
  • UK AI Safety Institute
  • Yale University
  • Posts and Telecommunications Institute of Technology Vietnam
  • Indian Institute of Technology Kharagpur
  • University of Calgary
  • Universidade de Lisboa
  • INSAIT
  • Ruhr-University Bochum
  • University of Arizona
  • PSL research University & IPSL
  • Tel Aviv University
  • All India Institute of Medical Sciences, New Delhi
  • University of Houston
  • ENS Paris-Saclay
  • Hewlett Packard Enterprise
  • Warsaw University of Technology
  • Northeastern University
  • European Organization for Nuclear Research
  • Rochester Institute of Technology
  • Department of Immunology
  • University of Windsor
  • Case Western Reserve University
  • TTIC-Toyota Technological Institute Chicago
  • Aalto University
  • Siili Solutions
  • Cohere
  • Donald and Barbara Zucker School of Medicine at Hofstra/Northwell
  • Ben Gurion University of the Negev
  • Indiana State University
  • University of Technology Sydney
  • TRR Designs
  • Charles University
  • Vrjie Universiteit Brussel
  • Joint Faculty of the Brandenburg University of Technology Cottbus Senftenberg
  • St. Petersburg College
  • Universidad Agraria la Molina
  • KU Leuven
  • Swinburne University of Technology
  • University of Leeds
  • Yonsei University
  • Sanford Burnham Prebys
  • Corteva Agriscience
  • Synbionix
  • Manhattan School of Music
  • Snorkel AI, Inc
  • Universidad Iberoamericana
  • University of Miami
  • PeopleTec
  • UZ Brussel
  • Université Paris-Saclay
  • Institute for Molecular Manufacturing
  • Indian Institute of Technology Bombay
  • Diverging Mathematics
  • Martin Luther University Halle-Wittenberg
  • Maastricht University
  • National University Philippines
  • University of Bath
  • Contramont Research
  • Rockwell Automation
  • Federal University of Juiz de Fora (UFJF)
  • University of Luxembourg
  • Stony Brook University
  • Charité - Universitätsmedizin Berlin
  • University of Freiburg
  • Concordia University
  • Ross University School of Medicine
  • Tufts University
  • The Jackson Laboratory
  • Accenture
  • Metropolitan State University of Denver
  • University of Canterbury
  • Hereford College of Arts
  • Alberta Health Services
  • Auckland University of Technology
  • Queen Mary University of London
  • Georgia Southern University
  • Nimbus AI Ltd
  • Eötvös Loránd University
  • Kyiv Polytechnic Institute
  • RWTH Aachen University
  • NAS of Ukraine
  • Kiev School of Economics
  • Texas A&M University

Research output: Contribution to journalArticlepeer-review

Abstract

Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve more than 90% accuracy on popular benchmarks such as Measuring Massive Multitask Language Understanding1, limiting informed measurement of state-of-the-art LLM capabilities. Here, in response, we introduce Humanity’s Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be an expert-level closed-ended academic benchmark with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable but cannot be quickly answered by internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a marked gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.

Original languageEnglish
Pages (from-to)1139-1146
Number of pages8
JournalNature
Volume649
Issue number8099
DOIs
Publication statusPublished - 29 Jan 2026

Fingerprint

Dive into the research topics of 'A benchmark of expert-level academic questions to assess AI capabilities'. Together they form a unique fingerprint.

Cite this