A Study on Hierarchical Text Classification as a Seq2seq Task

Fatos Torba, Christophe Gravier, Charlotte Laclau, Abderrhammen Kammoun, Julien Subercaze

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

With the progress of generative neural models, Hierarchical Text Classification (HTC) can be cast as a generative task. In this case, given an input text, the model generates the sequence of predicted class labels taken from a label tree of arbitrary width and depth. Treating HTC as a generative task introduces multiple modeling choices. These choices vary from choosing the order for visiting the class tree and therefore defining the order of generating tokens, choosing either to constrain the decoding to labels that respect the previous level predictions, up to choosing the pre-trained Language Model itself. Each HTC model therefore differs from the others from an architectural standpoint, but also from the modeling choices that were made. Prior contributions lack transparent modeling choices and open implementations, hindering the assessment of whether model performance stems from architectural or modeling decisions. For these reasons, we propose with this paper an analysis of the impact of different modeling choices along with common model errors and successes for this task. This analysis is based on an open framework coming along this paper that can facilitate the development of future contributions in the field by providing datasets, metrics, error analysis toolkit and the capability to readily test various modeling choices for one given model.

Original languageEnglish
Title of host publicationAdvances in Information Retrieval - 46th European Conference on Information Retrieval, ECIR 2024, Proceedings
EditorsNazli Goharian, Nicola Tonellotto, Yulan He, Aldo Lipani, Graham McDonald, Craig Macdonald, Iadh Ounis
PublisherSpringer Science and Business Media Deutschland GmbH
Pages287-296
Number of pages10
ISBN (Print)9783031560620
DOIs
Publication statusPublished - 1 Jan 2024
Event46th European Conference on Information Retrieval, ECIR 2024 - Glasgow, United Kingdom
Duration: 24 Mar 202428 Mar 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14610 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference46th European Conference on Information Retrieval, ECIR 2024
Country/TerritoryUnited Kingdom
CityGlasgow
Period24/03/2428/03/24

Keywords

  • Hierarchical text classification
  • generative model
  • reproducibility

Fingerprint

Dive into the research topics of 'A Study on Hierarchical Text Classification as a Seq2seq Task'. Together they form a unique fingerprint.

Cite this