Abstract
Sequential pattern mining (SPM) discovers, from event transactions recorded along time, patterns of events fulfilling a sequential order. In this work, we introduce a new efficient sequential pattern mining algorithm called VEPRECO. VEPRECO proposes three main contributions that fasten the mining process: a vertical representation of patterns, pre-pruning strategies to avoid checking infrequent patterns, and common candidate selection policies that reduce the number of iterations performed by the algorithm. An experimental evaluation was performed with synthetic and real-world datasets, and the results have been compared with the most time and memory-efficient sequential pattern mining algorithm in the literature, the CM-SPAM algorithm, which we have taken as a baseline. We analysed separately how each of the proposed contributions affects time and memory usage and found that the one that reduced the most time and memory was the representation of the proposed patterns. Pre-pruning strategies and common candidate selection policies reduce runtime in datasets with many sequences and similar lengths of transactions and sequences.
| Original language | English |
|---|---|
| Article number | 117517 |
| Journal | Expert Systems with Applications |
| Volume | 204 |
| DOIs | |
| Publication status | Published - 15 Oct 2022 |
Keywords
- Knowledge discovery
- Pattern mining
- Pattern representation
- Pruning strategies
- Sequential pattern mining
Fingerprint
Dive into the research topics of 'VEPRECO: Vertical databases with pre-pruning strategies and common candidate selection policies to fasten sequential pattern mining'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver