Passer à la navigation principale Passer à la recherche Passer au contenu principal

Refining Wikidata Taxonomy using Large Language Models

  • Institut Polytechnique de Paris

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

Due to its collaborative nature, Wikidata is known to have a complex taxonomy, with recurrent issues like the ambiguity between instances and classes, the inaccuracy of some taxonomic paths, the presence of cycles, and the high level of redundancy across classes. Manual efforts to clean up this taxonomy are time-consuming and prone to errors or subjective decisions. We present WiKC, a new version of Wikidata taxonomy cleaned automatically using a combination of Large Language Models (LLMs) and graph mining techniques. Operations on the taxonomy, such as cutting links or merging classes, are performed with the help of zero-shot prompting on an open-source LLM. The quality of the refined taxonomy is evaluated from both intrinsic and extrinsic perspectives, on a task of entity typing for the latter, showing the practical interest of WiKC.

langue originaleAnglais
titreCIKM 2024 - Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
EditeurAssociation for Computing Machinery
Pages5395-5399
Nombre de pages5
ISBN (Electronique)9798400704369
Les DOIs
étatPublié - 21 oct. 2024
Evénement33rd ACM International Conference on Information and Knowledge Management, CIKM 2024 - Boise, États-Unis
Durée: 21 oct. 202425 oct. 2024

Série de publications

NomInternational Conference on Information and Knowledge Management, Proceedings
ISSN (imprimé)2155-0751

Une conférence

Une conférence33rd ACM International Conference on Information and Knowledge Management, CIKM 2024
Pays/TerritoireÉtats-Unis
La villeBoise
période21/10/2425/10/24

Empreinte digitale

Examiner les sujets de recherche de « Refining Wikidata Taxonomy using Large Language Models ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation