Skip to main navigation Skip to search Skip to main content

LEGALBENCH: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

  • Neel Guha
  • , Julian Nyarko
  • , Daniel E. Ho
  • , Christopher Ré
  • , Adam Chilton
  • , Aditya Narayana
  • , Alex Chohlas-Wood
  • , Austin Peters
  • , Brandon Waldon
  • , Daniel N. Rockmore
  • , Diego Zambrano
  • , Dmitry Talisman
  • , Enam Hoque
  • , Faiz Surani
  • , Frank Fagan
  • , Galit Sarfaty
  • , Gregory M. Dickinson
  • , Haggai Porat
  • , Jason Hegland
  • , Jessica Wu
  • Joe Nudell, Joel Niklaus, John Nay, Jonathan H. Choi, Kevin Tobia, Margaret Hagan, Megan Ma, Michael Livermore, Nikon Rasumov-Rahe, Nils Holzenberger, Noam Kolt, Peter Henderson, Sean Rehaag, Sharad Goel, Shang Gao, Spencer Williams, Sunny Gandhi, Tom Zur, Varun Iyer, Zehua Li
  • Stanford University
  • University of Chicago
  • Maxime Tools
  • Dartmouth College
  • LawBeta
  • South Texas College of Law
  • University of Toronto
  • St. Thomas University Benjamin L. Crump College of Law
  • Harvard Law School
  • Stanford Center for Legal Informatics - CodeX
  • University of Southern California
  • Georgetown University Law Center
  • University of Virginia
  • Osgoode Hall Law School
  • Harvard University
  • Casetext
  • Golden Gate University
  • Luddy School of Informatics - Indiana University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LEGALBENCH: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning. LEGALBENCH was built through an interdisciplinary process, in which we collected tasks designed and hand-crafted by legal professionals. Because these subject matter experts took a leading role in construction, tasks either measure legal reasoning capabilities that are practically useful, or measure reasoning skills that lawyers find interesting. To enable cross-disciplinary conversations about LLMs in the law, we additionally show how popular legal frameworks for describing legal reasoning—which distinguish between its many forms—correspond to LEGALBENCH tasks, thus giving lawyers and LLM developers a common vocabulary. This paper describes LEGALBENCH, presents an empirical evaluation of 20 open-source and commercial LLMs, and illustrates the types of research explorations LEGALBENCH enables.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023
EditorsA. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, S. Levine
PublisherNeural information processing systems foundation
ISBN (Electronic)9781713899921
Publication statusPublished - 1 Jan 2023
Event37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, United States
Duration: 10 Dec 202316 Dec 2023

Publication series

NameAdvances in Neural Information Processing Systems
Volume36
ISSN (Print)1049-5258

Conference

Conference37th Conference on Neural Information Processing Systems, NeurIPS 2023
Country/TerritoryUnited States
CityNew Orleans
Period10/12/2316/12/23

Fingerprint

Dive into the research topics of 'LEGALBENCH: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models'. Together they form a unique fingerprint.

Cite this