Passer à la navigation principale Passer à la recherche Passer au contenu principal

Highly fast text segmentation with pairwise markov chains

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

Natural Language Processing (NLP) models' current trend consists of using increasingly more extra-data to build the best models as possible. It implies more expensive computational costs and training time, difficulties for deployment, and worries about these models' carbon footprint reveal a critical problem in the future. Against this trend, our goal is to develop NLP models requiring no extra-data and minimizing training time. To do so, in this paper, we explore Markov chain models, Hidden Markov Chain (HMC) and Pairwise Markov Chain (PMC), for NLP segmentation tasks. We apply these models for three classic applications: POS Tagging, Named-Entity-Recognition, and Chunking. We develop an original method to adapt these models for text segmentation's specific challenges to obtain relevant performances with very short training and execution times. PMC achieves equivalent results to those obtained by Conditional Random Fields (CRF), one of the most applied models for these tasks when no extra-data are used. Moreover, PMC has training times 30 times shorter than the CRF ones, which validates this model given our objectives.

langue originaleAnglais
titre6th International IEEE Congress on Information Science and Technology, CiSt 2020 - Proceeding
rédacteurs en chefMohammed El Mohajir, Mohammed Al Achhab, Badr Eddine El Mohajir, Bernadetta Kwintiana Ane, Ismail Jellouli
EditeurInstitute of Electrical and Electronics Engineers Inc.
Pages361-366
Nombre de pages6
ISBN (Electronique)9781728166469
Les DOIs
étatPublié - 5 juin 2020
Evénement6th International IEEE Congress on Information Science and Technology, CiSt 2020 - Agadir - Essaouira, Maroc
Durée: 5 juin 202012 juin 2020

Série de publications

NomColloquium in Information Science and Technology, CIST
Volume2020-June
ISSN (imprimé)2327-185X
ISSN (Electronique)2327-1884

Une conférence

Une conférence6th International IEEE Congress on Information Science and Technology, CiSt 2020
Pays/TerritoireMaroc
La villeAgadir - Essaouira
période5/06/2012/06/20

Empreinte digitale

Examiner les sujets de recherche de « Highly fast text segmentation with pairwise markov chains ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation