Abstract
Automatically segmenting text corpora into thematically related groups is a complex exploratory analysis problem. In this article, we outline our multi-stage exploratory analysis process and investigate the performance of a simple statistical model. After a description of this model and of its fitting procedure, we illustrate its performance on the segmentation of a corpus of CKM-related texts in English.
| Original language | English |
|---|---|
| Pages (from-to) | 13-22 |
| Number of pages | 10 |
| Journal | Management Information Systems |
| Volume | 10 |
| Publication status | Published - 1 Dec 2004 |
| Event | Fifth International Conference on Data Mining, DATA MINING V - Malaga, Spain Duration: 15 Sept 2004 → 17 Sept 2004 |
Keywords
- Clustering
- Exploratory analysis
- Mixture model
- Text mining