TY - GEN
T1 - Online evaluation of email streaming classifiers using GNUsmail
AU - Carmona-Cejudo, José M.
AU - Baena-García, Manuel
AU - Del Campo-Ávila, José
AU - Bifet, Albert
AU - Gama, João
AU - Morales-Bueno, Rafael
PY - 2011/11/9
Y1 - 2011/11/9
N2 - Real-time email classification is a challenging task because of its online nature, subject to concept-drift. Identifying spam, where only two labels exist, has received great attention in the literature. We are nevertheless interested in classification involving multiple folders, which is an additional source of complexity. Moreover, neither cross-validation nor other sampling procedures are suitable for data streams evaluation. Therefore, other metrics, like the prequential error, have been proposed. However, the prequential error poses some problems, which can be alleviated by using mechanisms such as fading factors. In this paper we present GNUsmail, an open-source extensible framework for email classification, and focus on its ability to perform online evaluation. GNUsmail's architecture supports incremental and online learning, and it can be used to compare different online mining methods, using state-of-art evaluation metrics. We show how GNUsmail can be used to compare different algorithms, including a tool for launching replicable experiments.
AB - Real-time email classification is a challenging task because of its online nature, subject to concept-drift. Identifying spam, where only two labels exist, has received great attention in the literature. We are nevertheless interested in classification involving multiple folders, which is an additional source of complexity. Moreover, neither cross-validation nor other sampling procedures are suitable for data streams evaluation. Therefore, other metrics, like the prequential error, have been proposed. However, the prequential error poses some problems, which can be alleviated by using mechanisms such as fading factors. In this paper we present GNUsmail, an open-source extensible framework for email classification, and focus on its ability to perform online evaluation. GNUsmail's architecture supports incremental and online learning, and it can be used to compare different online mining methods, using state-of-art evaluation metrics. We show how GNUsmail can be used to compare different algorithms, including a tool for launching replicable experiments.
KW - Concept Drift
KW - Email Classification
KW - Online Methods
KW - Text Mining
UR - https://www.scopus.com/pages/publications/80455129960
U2 - 10.1007/978-3-642-24800-9_11
DO - 10.1007/978-3-642-24800-9_11
M3 - Conference contribution
AN - SCOPUS:80455129960
SN - 9783642247996
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 90
EP - 100
BT - Advances in Intelligent Data Analysis X - 10th International Symposium, IDA 2011, Proceedings
T2 - 10th International Symposium on Intelligent Data Analysis, IDA 2011
Y2 - 29 October 2011 through 31 October 2011
ER -