TY - GEN
T1 - GNUsmail
T2 - 2nd Workshop on Knowledge Representation for Health Care, KR4HC 2010, held in conjunction with the 19th European Conference in Artificial Intelligence, ECAI 2010
AU - Carmona-Cejudo, José M.
AU - Baena-García, Manuel
AU - Del Campo-Ávila, José
AU - Morales-Bueno, Rafael
AU - Bifet, Albert
PY - 2010/1/1
Y1 - 2010/1/1
N2 - Real-time classification of massive email data is a challenging task that presents its own particular difficulties. Since email data presents an important temporal component, several problems arise: emails arrive continuously, and the criteria used to classify those emails can change, so the learning algorithms have to be able to deal with concept drift. Our problem is more general than spam detection, which has received much more attention in the literature. In this paper we present GNUsmail, an open-source extensible framework for email classification, which structure supports incremental and on-line learning. This framework enables the incorporation of algorithms developed by other researchers, such as those included in WEKA and MOA. We evaluate this framework, characterized by two overlapping phases (pre-processing and learning), using the ENRON dataset, and we compare the results achieved by WEKA and MOA algorithms.
AB - Real-time classification of massive email data is a challenging task that presents its own particular difficulties. Since email data presents an important temporal component, several problems arise: emails arrive continuously, and the criteria used to classify those emails can change, so the learning algorithms have to be able to deal with concept drift. Our problem is more general than spam detection, which has received much more attention in the literature. In this paper we present GNUsmail, an open-source extensible framework for email classification, which structure supports incremental and on-line learning. This framework enables the incorporation of algorithms developed by other researchers, such as those included in WEKA and MOA. We evaluate this framework, characterized by two overlapping phases (pre-processing and learning), using the ENRON dataset, and we compare the results achieved by WEKA and MOA algorithms.
UR - https://www.scopus.com/pages/publications/77956052318
U2 - 10.3233/978-1-60750-606-5-1141
DO - 10.3233/978-1-60750-606-5-1141
M3 - Conference contribution
AN - SCOPUS:77956052318
SN - 9781607506058
T3 - Frontiers in Artificial Intelligence and Applications
SP - 1141
EP - 1142
BT - ECAI 2010
PB - IOS Press
Y2 - 17 August 2010 through 17 August 2010
ER -