Linear time membership in a class of regular expressions with interleaving and counting

Giorgio Ghelli, Dario Colazzo, Carlo Sartiani

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The extension of Regular Expressions (REs) with an interleaving (shuffle) operator has been proposed in many occasions, since it would be crucial to deal with unordered data. However, interleaving badly affects the complexity of basic operations, and, expecially, makes membership NPhard [13], which is unacceptable for most uses of REs. REs form the basis of most XML type languages, such as DTDs and XML Schema types, and XDuce types [16, 11]. In this context, the interleaving operator would be a natural addition to the language of REs, as witnessed by the presence of limited forms of interleaving in XSD (the all group), Relax-NG, and SGML, provided that the NPhardness of membership could be avoided. We present here a restricted class of REs with interleaving and counting which admits a linear membership algorithm, and which is expressive enough to cover the vast majority of real-world XML types. We first present an algorithm for membership of a list of words into a RE with interleaving and counting, based on the translation of the RE into a set of constraints. We generalize the approach in order to check membership of XML trees into a class of EDTDs with interleaving and counting, which models the crucial aspects of DTDs and XSD schemas.

Original languageEnglish
Title of host publicationProceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM'08
Pages389-398
Number of pages10
DOIs
Publication statusPublished - 1 Dec 2008
Externally publishedYes
Event17th ACM Conference on Information and Knowledge Management, CIKM'08 - Napa Valley, CA, United States
Duration: 26 Oct 200830 Oct 2008

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Conference

Conference17th ACM Conference on Information and Knowledge Management, CIKM'08
Country/TerritoryUnited States
CityNapa Valley, CA
Period26/10/0830/10/08

Keywords

  • Theory

Fingerprint

Dive into the research topics of 'Linear time membership in a class of regular expressions with interleaving and counting'. Together they form a unique fingerprint.

Cite this