Gaps or Hallucinations? Scrutinizing Machine-Generated Legal Analysis for Fine-grained Text Evaluations

Abe Bohan Hou, William Jurayj, Nils Holzenberger, Andrew Blair-Stanek, Benjamin Van Durme

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Large Language Models (LLMs) show promise as a writing aid for professionals performing legal analyses. However, LLMs can often hallucinate in this setting, in ways difficult to recognize by non-professionals and existing text evaluation metrics. In this work, we pose the question: when can machine-generated legal analysis be evaluated as acceptable? We introduce the neutral notion of gaps - as opposed to hallucinations in a strict erroneous sense - to refer to the difference between human-written and machine-generated legal analysis. Gaps do not always equate to invalid generation. Working with legal experts, we consider the CLERC generation task proposed in Hou et al. (2024b), leading to a taxonomy, a fine-grained detector for predicting gap categories, and an annotated dataset for automatic evaluation. Our best detector achieves 67% F1 score and 80% precision on the test set. Employing this detector as an automated metric on legal analysis generated by SOTA LLMs, we find around 80% contain hallucinations of different kinds.

Original languageEnglish
Title of host publicationNLLP 2024 - Natural Legal Language Processing Workshop 2024, Proceedings of the Workshop
EditorsNikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Catalina Goanta, Daniel Preotiuc-Pietro, Gerasimos Spanakis
PublisherAssociation for Computational Linguistics (ACL)
Pages280-302
Number of pages23
ISBN (Electronic)9798891761834
Publication statusPublished - 1 Jan 2024
Event6th Natural Legal Language Processing Workshop 2024, NLLP 2024, co-located with the 2024 Conference on Empirical Methods in Natural Language Processing - Miami, United States
Duration: 16 Nov 2024 → …

Publication series

NameNLLP 2024 - Natural Legal Language Processing Workshop 2024, Proceedings of the Workshop

Conference

Conference6th Natural Legal Language Processing Workshop 2024, NLLP 2024, co-located with the 2024 Conference on Empirical Methods in Natural Language Processing
Country/TerritoryUnited States
CityMiami
Period16/11/24 → …

Fingerprint

Dive into the research topics of 'Gaps or Hallucinations? Scrutinizing Machine-Generated Legal Analysis for Fine-grained Text Evaluations'. Together they form a unique fingerprint.

Cite this