On Distances Between Words with Parameters

Pierre Bourhis, Aaron Boussidan, Philippe Gambette

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The edit distance between parameterized words is a generalization of the classical edit distance where it is allowed to map particular letters of the first word, called parameters, to parameters of the second word before computing the distance. This problem has been introduced in particular for detection of code duplication, and the notion of words with parameters has also been used with different semantics in other fields. The complexity of several variants of edit distances between parameterized words has been studied, however, the complexity of the most natural one, the Levenshtein distance, remained open. In this paper, we solve this open question and close the exhaustive analysis of all cases of parameterized word matching and function matching, showing that these problems are np-complete. To this aim, we also provide a comparison of the different problems, exhibiting several equivalences between them. We also provide and implement a MaxSAT encoding of the problem, as well as a simple FPT algorithm in the alphabet size, and study their efficiency on real data in the context of theater play structure comparison.

Original languageEnglish
Title of host publication34th Annual Symposium on Combinatorial Pattern Matching, CPM 2023
EditorsLaurent Bulteau, Zsuzsanna Liptak
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
ISBN (Electronic)9783959772761
DOIs
Publication statusPublished - 1 Jun 2023
Externally publishedYes
Event34th Annual Symposium on Combinatorial Pattern Matching, CPM 2023 - Marne-la-Vallee, France
Duration: 26 Jun 202328 Jun 2023

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume259
ISSN (Print)1868-8969

Conference

Conference34th Annual Symposium on Combinatorial Pattern Matching, CPM 2023
Country/TerritoryFrance
CityMarne-la-Vallee
Period26/06/2328/06/23

Keywords

  • Levenshtein
  • MAX-SAT
  • NP-completeness
  • String matching
  • edit distance
  • instantiable words
  • parameter words
  • parameterized matching
  • parameterized words

Fingerprint

Dive into the research topics of 'On Distances Between Words with Parameters'. Together they form a unique fingerprint.

Cite this