TY - GEN
T1 - RNA Inverse Folding Can Be Solved in Linear Time for Structures Without Isolated Stacks or Base Pairs
AU - Boury, Théo
AU - Bulteau, Laurent
AU - Ponty, Yann
N1 - Publisher Copyright:
© Théo Boury, Laurent Bulteau, and Yann Ponty;
PY - 2024/8/1
Y1 - 2024/8/1
N2 - Inverse folding is a classic instance of negative RNA design which consists in finding a sequence that uniquely folds into a target secondary structure with respect to energy minimization. A breakthrough result of Bonnet et al. shows that, even in simple base pairs-based (BP) models, the decision version of a mildly constrained version of inverse folding is NP-hard. In this work, we show that inverse folding can be solved in linear time for a large collection of targets, including every structure that contains no isolated BP and no isolated stack (or, equivalently, when all helices consist of 3+ base pairs). For structures featuring shorter helices, our linear algorithm is no longer guaranteed to produce a solution, but still does so for a large proportion of instances. Our approach introduces a notion of modulo m-separability, generalizing a property pioneered by Hales et al. Separability is a sufficient condition for the existence of a solution to the inverse folding problem. We show that, for any input secondary structure of length n, a modulo m-separated sequence can be produced in time O(n2m) anytime such a sequence exists. Meanwhile, we show that any structure consisting of 3+ base pairs is either trivially non-designable, or always admits a modulo-2 separated solution (m = 2). Solution sequences can thus be produced in linear time, and even be uniformly generated within the set of modulo-2 separable sequences.
AB - Inverse folding is a classic instance of negative RNA design which consists in finding a sequence that uniquely folds into a target secondary structure with respect to energy minimization. A breakthrough result of Bonnet et al. shows that, even in simple base pairs-based (BP) models, the decision version of a mildly constrained version of inverse folding is NP-hard. In this work, we show that inverse folding can be solved in linear time for a large collection of targets, including every structure that contains no isolated BP and no isolated stack (or, equivalently, when all helices consist of 3+ base pairs). For structures featuring shorter helices, our linear algorithm is no longer guaranteed to produce a solution, but still does so for a large proportion of instances. Our approach introduces a notion of modulo m-separability, generalizing a property pioneered by Hales et al. Separability is a sufficient condition for the existence of a solution to the inverse folding problem. We show that, for any input secondary structure of length n, a modulo m-separated sequence can be produced in time O(n2m) anytime such a sequence exists. Meanwhile, we show that any structure consisting of 3+ base pairs is either trivially non-designable, or always admits a modulo-2 separated solution (m = 2). Solution sequences can thus be produced in linear time, and even be uniformly generated within the set of modulo-2 separable sequences.
KW - Parameterized Complexity
KW - RNA structure
KW - String Design
KW - Uniform Sampling
UR - https://www.scopus.com/pages/publications/85203591792
U2 - 10.4230/LIPIcs.WABI.2024.19
DO - 10.4230/LIPIcs.WABI.2024.19
M3 - Conference contribution
AN - SCOPUS:85203591792
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 24th International Workshop on Algorithms in Bioinformatics, WABI 2024
A2 - Pissis, Solon P.
A2 - Sung, Wing-Kin
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 24th International Workshop on Algorithms in Bioinformatics, WABI 2024
Y2 - 2 September 2024 through 4 September 2024
ER -