TY - GEN
T1 - Graph-of-word and TW-IDF
T2 - 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013
AU - Rousseau, François
AU - Vazirgiannis, Michalis
PY - 2013/12/11
Y1 - 2013/12/11
N2 - In this paper, we introduce novel document representation (graph-of-word) and retrieval model (TW-IDF) for ad hoc IR. Questioning the term independence assumption behind the traditional bag-of-word model, we propose a different representation of a document that captures the relationships between the terms using an unweighted directed graph of terms. From this graph, we extract at indexing time meaningful term weights (TW) that replace traditional term frequencies (TF) and from which we define a novel scoring function, namely TW-IDF, by analogy with TF-IDF. This approach leads to a retrieval model that consistently and significantly outperforms BM25 and in some cases its extension BM25+ on various standard TREC datasets. In particular, experiments show that counting the number of different contexts in which a term occurs inside a document is more effective and relevant to search than considering an overall concave term frequency in the context of ad hoc IR.
AB - In this paper, we introduce novel document representation (graph-of-word) and retrieval model (TW-IDF) for ad hoc IR. Questioning the term independence assumption behind the traditional bag-of-word model, we propose a different representation of a document that captures the relationships between the terms using an unweighted directed graph of terms. From this graph, we extract at indexing time meaningful term weights (TW) that replace traditional term frequencies (TF) and from which we define a novel scoring function, namely TW-IDF, by analogy with TF-IDF. This approach leads to a retrieval model that consistently and significantly outperforms BM25 and in some cases its extension BM25+ on various standard TREC datasets. In particular, experiments show that counting the number of different contexts in which a term occurs inside a document is more effective and relevant to search than considering an overall concave term frequency in the context of ad hoc IR.
KW - Graph representation of document
KW - Graph-based term weighting
KW - Graph-of-word
KW - IR theory
KW - Scoring functions
KW - TW-IDF
UR - https://www.scopus.com/pages/publications/84889586401
U2 - 10.1145/2505515.2505671
DO - 10.1145/2505515.2505671
M3 - Conference contribution
AN - SCOPUS:84889586401
SN - 9781450322638
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 59
EP - 68
BT - CIKM 2013 - Proceedings of the 22nd ACM International Conference on Information and Knowledge Management
Y2 - 27 October 2013 through 1 November 2013
ER -