TY - GEN
T1 - Fast and accurate structure probability estimation for simultaneous alignment and folding of RNAs
AU - Miladi, Milad
AU - Raden, Martin
AU - Will, Sebastian
AU - Backofen, Rolf
N1 - Publisher Copyright:
© Milad Miladi, Martin Raden, Sebastian Will, and Rolf Backofen; licensed under Creative Commons License CC-BY
PY - 2019/9/1
Y1 - 2019/9/1
N2 - Motivation: Simultaneous alignment and folding (SA&F) of RNAs is the indispensable gold standard for inferring the structure of non-coding RNAs and their general analysis. The original algorithm, proposed by Sankoff, solves the theoretical problem exactly with a complexity of O(n6) in the full energy model. Over the last two decades, several variants and improvements of the Sankoff algorithm have been proposed to reduce its extreme complexity by proposing simplified energy models or imposing restrictions on the predicted alignments. Results: Here we introduce a novel variant of Sankoff's algorithm that reconciles the simplifications of PMcomp, namely moving from the full energy model to a simpler base pair-based model, with the accuracy of the loop-based full energy model. Instead of estimating pseudo-energies from unconditional base pair probabilities, our model calculates energies from conditional base pair probabilities that allow to accurately capture structure probabilities, which obey a conditional dependency. Supporting modifications with surgical precision, this model gives rise to the fast and highly accurate novel algorithm Pankov (Probabilistic Sankoff-like simultaneous alignment and folding of RNAs inspired by Markov chains). Pankov benefits from the speed-up of excluding unreliable base-pairing without compromising the loop-based free energy model of the Sankoff's algorithm. We show that Pankov outperforms its predecessors LocARNA and SPARSE in folding quality and is faster than LocARNA. Pankov is developed as a branch of the LocARNA package and available at https://github.com/mmiladi/Pankov.
AB - Motivation: Simultaneous alignment and folding (SA&F) of RNAs is the indispensable gold standard for inferring the structure of non-coding RNAs and their general analysis. The original algorithm, proposed by Sankoff, solves the theoretical problem exactly with a complexity of O(n6) in the full energy model. Over the last two decades, several variants and improvements of the Sankoff algorithm have been proposed to reduce its extreme complexity by proposing simplified energy models or imposing restrictions on the predicted alignments. Results: Here we introduce a novel variant of Sankoff's algorithm that reconciles the simplifications of PMcomp, namely moving from the full energy model to a simpler base pair-based model, with the accuracy of the loop-based full energy model. Instead of estimating pseudo-energies from unconditional base pair probabilities, our model calculates energies from conditional base pair probabilities that allow to accurately capture structure probabilities, which obey a conditional dependency. Supporting modifications with surgical precision, this model gives rise to the fast and highly accurate novel algorithm Pankov (Probabilistic Sankoff-like simultaneous alignment and folding of RNAs inspired by Markov chains). Pankov benefits from the speed-up of excluding unreliable base-pairing without compromising the loop-based free energy model of the Sankoff's algorithm. We show that Pankov outperforms its predecessors LocARNA and SPARSE in folding quality and is faster than LocARNA. Pankov is developed as a branch of the LocARNA package and available at https://github.com/mmiladi/Pankov.
KW - Algorithms
KW - Alignment
KW - RNA secondary structure
KW - Structural bioinformatics
UR - https://www.scopus.com/pages/publications/85072629236
U2 - 10.4230/LIPIcs.WABI.2019.14
DO - 10.4230/LIPIcs.WABI.2019.14
M3 - Conference contribution
AN - SCOPUS:85072629236
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 19th International Workshop on Algorithms in Bioinformatics, WABI 2019
A2 - Huber, Katharina T.
A2 - Gusfield, Dan
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 19th International Workshop on Algorithms in Bioinformatics, WABI 2019
Y2 - 8 September 2019 through 10 September 2019
ER -