Extremely fast decision tree mining for evolving data streams

  • Albert Bifet
  • , Jiajin Zhang
  • , Wei Fan
  • , Cheng He
  • , Jianfeng Zhang
  • , Jianfeng Qian
  • , Geoff Holmes
  • , Bernhard Pfahringer

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Nowadays real-time industrial applications are generating a huge amount of data continuously every day. To process these large data streams, we need fast and eficient methodologies and systems. A useful feature desired for data scientists and analysts is to have easy to visualize and understand machine learning models. Decision trees are preferred in many real-time applications for this reason, and also, because combined in an ensemble, they are one of the most powerful methods in machine learning. In this paper, we present a new system called streamDM-C++, that implements decision trees for data streams in C++, and that has been used extensively at Huawei. Streaming decision trees adapt to changes on streams, a huge advantage since standard decision trees are built using a snapshot of data, and can not evolve over time. streamDM-C++ is easy to extend, and contains more powerful ensemble methods, and a more eficient and easy to use adaptive decision trees. We compare our new implementation with VFML, the current state of the art implementation in C, and show how our new system outperforms VFML in speed using less resources.

Original languageEnglish
Title of host publicationKDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages1733-1742
Number of pages10
ISBN (Electronic)9781450348874
DOIs
Publication statusPublished - 13 Aug 2017
Externally publishedYes
Event23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017 - Halifax, Canada
Duration: 13 Aug 201717 Aug 2017

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
VolumePart F129685

Conference

Conference23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017
Country/TerritoryCanada
CityHalifax
Period13/08/1717/08/17

Keywords

  • Classification
  • Data streams
  • Decision trees
  • Online learning

Fingerprint

Dive into the research topics of 'Extremely fast decision tree mining for evolving data streams'. Together they form a unique fingerprint.

Cite this