Declarative XML data cleaning with XClean

Melanie Weis, Ioana Manolescu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Data cleaning is the process of correcting anomalies in a data source, that may for instance be due to typographical errors, or duplicate representations of an entity. It is a crucial task in customer relationship management, data mining, and data integration. With the growing amount of XML data, approaches to effectively and efficiently clean XML are needed, an issue not addressed by existing data cleaning systems that mostly specialize on relational data. We present XClean, a data cleaning framework specifically geared towards cleaning XML data. XClean's approach is based on a set of cleaning operators, whose semantics is well-defined in terms of XML algebraic operators. Users may specify cleaning programs by combining operators by means of a declarative XClean/PL program, which is then compiled into XQuery. We describe XClean's operators, language, and compilation approach, and validate its effectiveness through a series of case studies.

Original languageEnglish
Title of host publicationAdvanced Information Systems Engineering - 19th International Conference, CAiSE 2007, Proceedings
PublisherSpringer Verlag
Pages96-110
Number of pages15
ISBN (Print)9783540729877
DOIs
Publication statusPublished - 1 Jan 2007
Externally publishedYes
Event19th International Conference on Advanced Information Systems Engineering, CAiSE 2007 - Trondheim, Norway
Duration: 11 Jun 200715 Jun 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4495 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference19th International Conference on Advanced Information Systems Engineering, CAiSE 2007
Country/TerritoryNorway
CityTrondheim
Period11/06/0715/06/07

Fingerprint

Dive into the research topics of 'Declarative XML data cleaning with XClean'. Together they form a unique fingerprint.

Cite this