TY - GEN
T1 - Mapping and Cleaning Open Commonsense Knowledge Bases with Generative Translation
AU - Romero, Julien
AU - Razniewski, Simon
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023/1/1
Y1 - 2023/1/1
N2 - Structured knowledge bases (KBs) are the backbone of many knowledge-intensive applications, and their automated construction has received considerable attention. In particular, open information extraction (OpenIE) is often used to induce structure from a text. However, although it allows high recall, the extracted knowledge tends to inherit noise from the sources and the OpenIE algorithm. Besides, OpenIE tuples contain an open-ended, non-canonicalized set of relations, making the extracted knowledge’s downstream exploitation harder. In this paper, we study the problem of mapping an open KB into the fixed schema of an existing KB, specifically for the case of commonsense knowledge. We propose approaching the problem by generative translation, i.e., by training a language model to generate fixed-schema assertions from open ones. Experiments show that this approach occupies a sweet spot between traditional manual, rule-based, or classification-based canonicalization and purely generative KB construction like COMET. Moreover, it produces higher mapping accuracy than the former while avoiding the association-based noise of the latter. Code and data are available. (https://github.com/Aunsiels/GenT, julienromero.fr/data/GenT
AB - Structured knowledge bases (KBs) are the backbone of many knowledge-intensive applications, and their automated construction has received considerable attention. In particular, open information extraction (OpenIE) is often used to induce structure from a text. However, although it allows high recall, the extracted knowledge tends to inherit noise from the sources and the OpenIE algorithm. Besides, OpenIE tuples contain an open-ended, non-canonicalized set of relations, making the extracted knowledge’s downstream exploitation harder. In this paper, we study the problem of mapping an open KB into the fixed schema of an existing KB, specifically for the case of commonsense knowledge. We propose approaching the problem by generative translation, i.e., by training a language model to generate fixed-schema assertions from open ones. Experiments show that this approach occupies a sweet spot between traditional manual, rule-based, or classification-based canonicalization and purely generative KB construction like COMET. Moreover, it produces higher mapping accuracy than the former while avoiding the association-based noise of the latter. Code and data are available. (https://github.com/Aunsiels/GenT, julienromero.fr/data/GenT
KW - Generative Language Models
KW - Open Knowledge Bases
KW - Schema Matching
UR - https://www.scopus.com/pages/publications/85177226854
U2 - 10.1007/978-3-031-47240-4_20
DO - 10.1007/978-3-031-47240-4_20
M3 - Conference contribution
AN - SCOPUS:85177226854
SN - 9783031472398
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 368
EP - 387
BT - The Semantic Web – ISWC 2023 - 22nd International Semantic Web Conference, Proceedings
A2 - Payne, Terry R.
A2 - Presutti, Valentina
A2 - Qi, Guilin
A2 - Poveda-Villalón, María
A2 - Stoilos, Giorgos
A2 - Hollink, Laura
A2 - Kaoudi, Zoi
A2 - Cheng, Gong
A2 - Li, Juanzi
PB - Springer Science and Business Media Deutschland GmbH
T2 - 22nd International Semantic Web Conference, ISWC 2023
Y2 - 6 November 2023 through 10 November 2023
ER -