TY - GEN
T1 - Refining Wikidata Taxonomy using Large Language Models
AU - Peng, Yiwen
AU - Bonald, Thomas
AU - Alam, Mehwish
N1 - Publisher Copyright:
© 2024 Owner/Author.
PY - 2024/10/21
Y1 - 2024/10/21
N2 - Due to its collaborative nature, Wikidata is known to have a complex taxonomy, with recurrent issues like the ambiguity between instances and classes, the inaccuracy of some taxonomic paths, the presence of cycles, and the high level of redundancy across classes. Manual efforts to clean up this taxonomy are time-consuming and prone to errors or subjective decisions. We present WiKC, a new version of Wikidata taxonomy cleaned automatically using a combination of Large Language Models (LLMs) and graph mining techniques. Operations on the taxonomy, such as cutting links or merging classes, are performed with the help of zero-shot prompting on an open-source LLM. The quality of the refined taxonomy is evaluated from both intrinsic and extrinsic perspectives, on a task of entity typing for the latter, showing the practical interest of WiKC.
AB - Due to its collaborative nature, Wikidata is known to have a complex taxonomy, with recurrent issues like the ambiguity between instances and classes, the inaccuracy of some taxonomic paths, the presence of cycles, and the high level of redundancy across classes. Manual efforts to clean up this taxonomy are time-consuming and prone to errors or subjective decisions. We present WiKC, a new version of Wikidata taxonomy cleaned automatically using a combination of Large Language Models (LLMs) and graph mining techniques. Operations on the taxonomy, such as cutting links or merging classes, are performed with the help of zero-shot prompting on an open-source LLM. The quality of the refined taxonomy is evaluated from both intrinsic and extrinsic perspectives, on a task of entity typing for the latter, showing the practical interest of WiKC.
KW - graph mining
KW - knowledge graphs
KW - large language model
U2 - 10.1145/3627673.3679156
DO - 10.1145/3627673.3679156
M3 - Conference contribution
AN - SCOPUS:85209994602
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 5395
EP - 5399
BT - CIKM 2024 - Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 33rd ACM International Conference on Information and Knowledge Management, CIKM 2024
Y2 - 21 October 2024 through 25 October 2024
ER -