TY - GEN
T1 - Efficient asymmetric inclusion between Regular Expression types
AU - Colazzo, Dario
AU - Ghelli, Giorgio
AU - Sartiani, Carlo
PY - 2009/3/23
Y1 - 2009/3/23
N2 - The inclusion of Regular Expressions (REs) is the kernel of any subtype checking algorithm for XML schema languages. XML applications would benefit from the extension of REs with interleaving and counting, but this is not feasible in general, since inclusion is EXPSPACE-complete for such extended REs. In [9] we introduced a notion of "conflict-free REs", which are extended REs with excellent complexity behaviour, including a cubic inclusion algorithm [9] and linear membership [10]. Conflict-free REs have interleaving and counting, but the complexity is tamed by the "conflict-free" limitations, which have been found to be satisfied by the vast majority of the content models published on the Web. However, the most important use of subtype checking is in the context of type-cheching of XML manipulation lan-guges. A type checker works by testing the inclusion of inferred subtypes in declared supertypes. The conflict-free restriction, while quite harmless for the human-defined supertype, is far too restrictive for the inferred subtype, whose shape is difficult to constrain. We show here that the PTIME inclusion algorithm can be actually extended to deal with totally unrestricted REs with counting and interleaving in the subtype position, provided that the supertype is conflict-free. This is exactly the expressive power that we need in order to use subtyping inside type-checking algorithms, and the cost of this generalized algorithm is only quadratic, which is as good as the best algorithm we have for the symmetric case (see [5]). The result is extremely surprising, since we had previously found that asymmetric inclusion becomes NP-hard as soon as the candidate subtype is enriched with binary intersection, a generalization that looked much more innocent than what we achieve here.
AB - The inclusion of Regular Expressions (REs) is the kernel of any subtype checking algorithm for XML schema languages. XML applications would benefit from the extension of REs with interleaving and counting, but this is not feasible in general, since inclusion is EXPSPACE-complete for such extended REs. In [9] we introduced a notion of "conflict-free REs", which are extended REs with excellent complexity behaviour, including a cubic inclusion algorithm [9] and linear membership [10]. Conflict-free REs have interleaving and counting, but the complexity is tamed by the "conflict-free" limitations, which have been found to be satisfied by the vast majority of the content models published on the Web. However, the most important use of subtype checking is in the context of type-cheching of XML manipulation lan-guges. A type checker works by testing the inclusion of inferred subtypes in declared supertypes. The conflict-free restriction, while quite harmless for the human-defined supertype, is far too restrictive for the inferred subtype, whose shape is difficult to constrain. We show here that the PTIME inclusion algorithm can be actually extended to deal with totally unrestricted REs with counting and interleaving in the subtype position, provided that the supertype is conflict-free. This is exactly the expressive power that we need in order to use subtyping inside type-checking algorithms, and the cost of this generalized algorithm is only quadratic, which is as good as the best algorithm we have for the symmetric case (see [5]). The result is extremely surprising, since we had previously found that asymmetric inclusion becomes NP-hard as soon as the candidate subtype is enriched with binary intersection, a generalization that looked much more innocent than what we achieve here.
KW - Language inclusion
KW - Regular Expressions
KW - XML
U2 - 10.1145/1514894.1514916
DO - 10.1145/1514894.1514916
M3 - Conference contribution
AN - SCOPUS:70349155859
SN - 9781605584232
T3 - ACM International Conference Proceeding Series
SP - 174
EP - 182
BT - Proceedings of the 12th International Conference on Database Theory, ICDT'09
PB - Association for Computing Machinery (ACM)
T2 - 12th International Conference on Database Theory, ICDT 2009
Y2 - 23 March 2009 through 25 March 2009
ER -