Skip to main navigation Skip to search Skip to main content

Graph-of-word and TW-IDF: New approach to Ad Hoc IR

  • Laboratoire d'Informatique (LIX)
  • Athens Univ. of Econ. and Business
  • Institut Mines-Télécom

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, we introduce novel document representation (graph-of-word) and retrieval model (TW-IDF) for ad hoc IR. Questioning the term independence assumption behind the traditional bag-of-word model, we propose a different representation of a document that captures the relationships between the terms using an unweighted directed graph of terms. From this graph, we extract at indexing time meaningful term weights (TW) that replace traditional term frequencies (TF) and from which we define a novel scoring function, namely TW-IDF, by analogy with TF-IDF. This approach leads to a retrieval model that consistently and significantly outperforms BM25 and in some cases its extension BM25+ on various standard TREC datasets. In particular, experiments show that counting the number of different contexts in which a term occurs inside a document is more effective and relevant to search than considering an overall concave term frequency in the context of ad hoc IR.

Original languageEnglish
Title of host publicationCIKM 2013 - Proceedings of the 22nd ACM International Conference on Information and Knowledge Management
Pages59-68
Number of pages10
DOIs
Publication statusPublished - 11 Dec 2013
Event22nd ACM International Conference on Information and Knowledge Management, CIKM 2013 - San Francisco, CA, United States
Duration: 27 Oct 20131 Nov 2013

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Conference

Conference22nd ACM International Conference on Information and Knowledge Management, CIKM 2013
Country/TerritoryUnited States
CitySan Francisco, CA
Period27/10/131/11/13

Keywords

  • Graph representation of document
  • Graph-based term weighting
  • Graph-of-word
  • IR theory
  • Scoring functions
  • TW-IDF

Fingerprint

Dive into the research topics of 'Graph-of-word and TW-IDF: New approach to Ad Hoc IR'. Together they form a unique fingerprint.

Cite this