Passer à la navigation principale Passer à la recherche Passer au contenu principal

Unsupervised word polysemy quantification with multiresolution grids of contextual embeddings

  • École Polytechnique
  • Athens Univ. of Econ. and Business

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

The number of senses of a given word, or polysemy, is a very subjective notion, which varies widely across annotators and resources. We propose a novel method to estimate polysemy based on simple geometry in the contextual embedding space. Our approach is fully unsupervised and purely data-driven. Through rigorous experiments, we show that our rankings are well correlated, with strong statistical significance, with 6 different rankings derived from famous human-constructed resources such as WordNet, OntoNotes, Oxford, Wikipedia, etc., for 6 different standard metrics. We also visualize and analyze the correlation between the human rankings and make interesting observations. A valuable by-product of our method is the ability to sample, at no extra cost, sentences containing different senses of a given word. Finally, the fully unsupervised nature of our approach makes it applicable to any language. Code and data are publicly available.

langue originaleAnglais
titreEACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
EditeurAssociation for Computational Linguistics (ACL)
Pages3391-3401
Nombre de pages11
ISBN (Electronique)9781954085022
Les DOIs
étatPublié - 1 janv. 2021
Modification externeOui
Evénement16th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2021 - Virtual, Online
Durée: 19 avr. 202123 avr. 2021

Série de publications

NomEACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference

Une conférence

Une conférence16th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2021
La villeVirtual, Online
période19/04/2123/04/21

Empreinte digitale

Examiner les sujets de recherche de « Unsupervised word polysemy quantification with multiresolution grids of contextual embeddings ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation