Skip to main navigation Skip to search Skip to main content

Online evaluation of email streaming classifiers using GNUsmail

  • José M. Carmona-Cejudo
  • , Manuel Baena-García
  • , José Del Campo-Ávila
  • , Albert Bifet
  • , João Gama
  • , Rafael Morales-Bueno

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Real-time email classification is a challenging task because of its online nature, subject to concept-drift. Identifying spam, where only two labels exist, has received great attention in the literature. We are nevertheless interested in classification involving multiple folders, which is an additional source of complexity. Moreover, neither cross-validation nor other sampling procedures are suitable for data streams evaluation. Therefore, other metrics, like the prequential error, have been proposed. However, the prequential error poses some problems, which can be alleviated by using mechanisms such as fading factors. In this paper we present GNUsmail, an open-source extensible framework for email classification, and focus on its ability to perform online evaluation. GNUsmail's architecture supports incremental and online learning, and it can be used to compare different online mining methods, using state-of-art evaluation metrics. We show how GNUsmail can be used to compare different algorithms, including a tool for launching replicable experiments.

Original languageEnglish
Title of host publicationAdvances in Intelligent Data Analysis X - 10th International Symposium, IDA 2011, Proceedings
Pages90-100
Number of pages11
DOIs
Publication statusPublished - 9 Nov 2011
Externally publishedYes
Event10th International Symposium on Intelligent Data Analysis, IDA 2011 - Porto, Portugal
Duration: 29 Oct 201131 Oct 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7014 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference10th International Symposium on Intelligent Data Analysis, IDA 2011
Country/TerritoryPortugal
CityPorto
Period29/10/1131/10/11

Keywords

  • Concept Drift
  • Email Classification
  • Online Methods
  • Text Mining

Fingerprint

Dive into the research topics of 'Online evaluation of email streaming classifiers using GNUsmail'. Together they form a unique fingerprint.

Cite this