Improving performances of log mining for anomaly prediction through nlp-based log parsing

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Failure prediction of industrial systems is a promising application domain for data mining approaches and should naturally rely on log messages which are a prime source of data as they are generated by many systems. However, before extracting relevant information of such log messages, another critical step is to parse the logs, that is to say to transform a raw unstructured text from the log messages into a suitable input for data mining. These two problems (log parsing then log mining) are often studied separately while they are directly related in the context of failure prediction; moreover, few performance benchmarks are publicly available. In this paper, we focus on the impact of log parsing techniques via natural language processing on the performances of log mining on two datasets. The first one is a log of an industrial aeronautical system comprising over 4,500,000 messages collected over one year of operation; the second one is a public benchmark set from an HDFS cluster. On the latter, we show that it is possible to raise the F-score from 96% to 99.2% while using simpler and more robust log parsing techniques that require less parameter tuning provided that they are correctly combined with log mining techniques.

Original languageEnglish
Title of host publicationProceedings - 26th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages237-243
Number of pages7
ISBN (Electronic)9781538668863
DOIs
Publication statusPublished - 7 Nov 2018
Externally publishedYes
Event26th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2018 - Milwaukee, United States
Duration: 25 Sept 201828 Sept 2018

Publication series

NameProceedings - 26th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2018

Conference

Conference26th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2018
Country/TerritoryUnited States
CityMilwaukee
Period25/09/1828/09/18

Keywords

  • Data Mining
  • Failure Prediction
  • Log Parsing
  • Machine Learning
  • Natural Language Processing

Fingerprint

Dive into the research topics of 'Improving performances of log mining for anomaly prediction through nlp-based log parsing'. Together they form a unique fingerprint.

Cite this