Decoding the Hierarchy: A Hybrid Approach to Hierarchical Multi-label Text Classification

  • Fatos Torba
  • , Christophe Gravier
  • , Charlotte Laclau
  • , Abderrhammen Kammoun
  • , Julien Subercaze

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Hierarchical multi-label text classification (HMTC) aims to predict multiple labels from a tree-like hierarchy for a given input text. Recent approaches frame HMTC as a seq2seq problem, where the objective is to predict the sequence of associated labels, regardless of their order or position in the hierarchy. Despite promising results, these approaches rely solely on attention mechanisms from previously generated tokens. This limit prevents them from acquiring information about the global hierarchy and may lead to the accumulation of errors as the model learns hierarchical cues among labels. We propose a novel HMTC model based on a hybrid version of the encoder-decoder architecture where the decoder is pre-populated with the entire label embeddings. By leveraging the decoder’s Cross-Attention and Hierarchical Self-Attention mechanisms, we achieve a label representation that benefits from instance and global label-wise information. Empirical experiments on four HMTC benchmark datasets demonstrated the effectiveness of our approach by settling new state-of-the-art results. Code (https://github.com/FatosTorba/HLPD) and datasets are made available to facilitate the reproducibility and future work.

Original languageEnglish
Title of host publicationAdvances in Information Retrieval - 47th European Conference on Information Retrieval, ECIR 2025, Proceedings
EditorsClaudia Hauff, Craig Macdonald, Dietmar Jannach, Gabriella Kazai, Franco Maria Nardini, Fabio Pinelli, Fabrizio Silvestri, Nicola Tonellotto
PublisherSpringer Science and Business Media Deutschland GmbH
Pages405-420
Number of pages16
ISBN (Print)9783031887079
DOIs
Publication statusPublished - 1 Jan 2025
Event47th European Conference on Information Retrieval, ECIR 2025 - Lucca, Italy
Duration: 6 Apr 202510 Apr 2025

Publication series

NameLecture Notes in Computer Science
Volume15572 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference47th European Conference on Information Retrieval, ECIR 2025
Country/TerritoryItaly
CityLucca
Period6/04/2510/04/25

Keywords

  • Hierarchical Multi-label Text Classification
  • Hierarchical Self-Attention Mechanism
  • Reproducibility
  • Seq2Seq

Fingerprint

Dive into the research topics of 'Decoding the Hierarchy: A Hybrid Approach to Hierarchical Multi-label Text Classification'. Together they form a unique fingerprint.

Cite this