Passer à la navigation principale Passer à la recherche Passer au contenu principal

Graph-of-word and TW-IDF: New approach to Ad Hoc IR

  • Laboratoire d'Informatique (LIX)
  • Athens Univ. of Econ. and Business
  • Institut Mines-Télécom

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

In this paper, we introduce novel document representation (graph-of-word) and retrieval model (TW-IDF) for ad hoc IR. Questioning the term independence assumption behind the traditional bag-of-word model, we propose a different representation of a document that captures the relationships between the terms using an unweighted directed graph of terms. From this graph, we extract at indexing time meaningful term weights (TW) that replace traditional term frequencies (TF) and from which we define a novel scoring function, namely TW-IDF, by analogy with TF-IDF. This approach leads to a retrieval model that consistently and significantly outperforms BM25 and in some cases its extension BM25+ on various standard TREC datasets. In particular, experiments show that counting the number of different contexts in which a term occurs inside a document is more effective and relevant to search than considering an overall concave term frequency in the context of ad hoc IR.

langue originaleAnglais
titreCIKM 2013 - Proceedings of the 22nd ACM International Conference on Information and Knowledge Management
Pages59-68
Nombre de pages10
Les DOIs
étatPublié - 11 déc. 2013
Evénement22nd ACM International Conference on Information and Knowledge Management, CIKM 2013 - San Francisco, CA, États-Unis
Durée: 27 oct. 20131 nov. 2013

Série de publications

NomInternational Conference on Information and Knowledge Management, Proceedings

Une conférence

Une conférence22nd ACM International Conference on Information and Knowledge Management, CIKM 2013
Pays/TerritoireÉtats-Unis
La villeSan Francisco, CA
période27/10/131/11/13

Empreinte digitale

Examiner les sujets de recherche de « Graph-of-word and TW-IDF: New approach to Ad Hoc IR ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation