TY - GEN
T1 - Verifying the Steps of Deductive Reasoning Chains
AU - Sadeddine, Zacchary
AU - Suchanek, Fabian M.
N1 - Publisher Copyright:
© 2025 Association for Computational Linguistics.
PY - 2025/1/1
Y1 - 2025/1/1
N2 - As Large Language Models penetrate everyday life more and more, it becomes essential to measure the correctness of their output. In this paper, we propose a novel task: the automatic verification of individual reasoning steps in a logical deductive Chain-of-Thought. This task addresses two well-known problems of LLMs, hallucination and incorrect reasoning. We propose a new dataset of logical reasoning chains, in which the individual deduction steps have been manually annotated for soundness, and benchmark several methods on it. We find that LLMs can detect unsound reasoning steps fairly well, but argue that verification has to be performed by transparent methods instead. We test symbolic methods, but find that they under-perform. We develop a neuro-symbolic baseline called VANESSA that comes closer to the performance of LLMs.
AB - As Large Language Models penetrate everyday life more and more, it becomes essential to measure the correctness of their output. In this paper, we propose a novel task: the automatic verification of individual reasoning steps in a logical deductive Chain-of-Thought. This task addresses two well-known problems of LLMs, hallucination and incorrect reasoning. We propose a new dataset of logical reasoning chains, in which the individual deduction steps have been manually annotated for soundness, and benchmark several methods on it. We find that LLMs can detect unsound reasoning steps fairly well, but argue that verification has to be performed by transparent methods instead. We test symbolic methods, but find that they under-perform. We develop a neuro-symbolic baseline called VANESSA that comes closer to the performance of LLMs.
UR - https://www.scopus.com/pages/publications/105028600071
U2 - 10.18653/v1/2025.findings-acl.25
DO - 10.18653/v1/2025.findings-acl.25
M3 - Conference contribution
AN - SCOPUS:105028600071
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 456
EP - 475
BT - Findings of the Association for Computational Linguistics
A2 - Che, Wanxiang
A2 - Nabende, Joyce
A2 - Shutova, Ekaterina
A2 - Pilehvar, Mohammad Taher
PB - Association for Computational Linguistics (ACL)
T2 - 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Y2 - 27 July 2025 through 1 August 2025
ER -