Refining Wikidata Taxonomy using Large Language Models

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Due to its collaborative nature, Wikidata is known to have a complex taxonomy, with recurrent issues like the ambiguity between instances and classes, the inaccuracy of some taxonomic paths, the presence of cycles, and the high level of redundancy across classes. Manual efforts to clean up this taxonomy are time-consuming and prone to errors or subjective decisions. We present WiKC, a new version of Wikidata taxonomy cleaned automatically using a combination of Large Language Models (LLMs) and graph mining techniques. Operations on the taxonomy, such as cutting links or merging classes, are performed with the help of zero-shot prompting on an open-source LLM. The quality of the refined taxonomy is evaluated from both intrinsic and extrinsic perspectives, on a task of entity typing for the latter, showing the practical interest of WiKC.

Original languageEnglish
Title of host publicationCIKM 2024 - Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages5395-5399
Number of pages5
ISBN (Electronic)9798400704369
DOIs
Publication statusPublished - 21 Oct 2024
Event33rd ACM International Conference on Information and Knowledge Management, CIKM 2024 - Boise, United States
Duration: 21 Oct 202425 Oct 2024

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings
ISSN (Print)2155-0751

Conference

Conference33rd ACM International Conference on Information and Knowledge Management, CIKM 2024
Country/TerritoryUnited States
CityBoise
Period21/10/2425/10/24

Keywords

  • graph mining
  • knowledge graphs
  • large language model

Fingerprint

Dive into the research topics of 'Refining Wikidata Taxonomy using Large Language Models'. Together they form a unique fingerprint.

Cite this