Efficient online novelty detection in news streams

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Novelty detection in text streams is a challenging task that emerges in quite a few different scenarii, ranging from email threads to RSS news feeds on a cell phone. An efficient novelty detection algorithm can save the user a great deal of time when accessing interesting information. Most of the recent research for the detection of novel documents in text streams uses either geometric distances or distributional similarities with the former typically performing better but being slower as we need to compare an incoming document with all the previously seen ones. In this paper, we propose a new novelty detection algorithm based on the Inverse Document Frequency (IDF) scoring function. Computing novelty based on IDF enables us to avoid similarity comparisons with previous documents in the text stream, thus leading to faster execution times. At the same time, our proposed approach outperforms several commonly used baselines when applied on a real-world news articles dataset.

Original languageEnglish
Title of host publicationWeb Information Systems Engineering, WISE 2013 - 14th International Conference, Proceedings
Pages57-71
Number of pages15
EditionPART 1
DOIs
Publication statusPublished - 18 Nov 2013
Event14th International Conference on Web Information Systems Engineering, WISE 2013 - Nanjing, China
Duration: 13 Oct 201315 Oct 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 1
Volume8180 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference14th International Conference on Web Information Systems Engineering, WISE 2013
Country/TerritoryChina
CityNanjing
Period13/10/1315/10/13

Keywords

  • inverse document frequency
  • news streams
  • novelty detection

Fingerprint

Dive into the research topics of 'Efficient online novelty detection in news streams'. Together they form a unique fingerprint.

Cite this