TY - JOUR
T1 - Knotty
T2 - Efficient and accurate prediction of complex RNA pseudoknot structures
AU - Jabbari, Hosna
AU - Wark, Ian
AU - Montemagno, Carlo
AU - Will, Sebastian
N1 - Publisher Copyright:
© The Author(s) 2018. Published by Oxford University Press. All rights reserved.
PY - 2018/11/15
Y1 - 2018/11/15
N2 - Motivation The computational prediction of RNA secondary structure by free energy minimization has become an important tool in RNA research. However in practice, energy minimization is mostly limited to pseudoknot-free structures or rather simple pseudoknots, not covering many biologically important structures such as kissing hairpins. Algorithms capable of predicting sufficiently complex pseudoknots (for sequences of length n) used to have extreme complexities, e.g. Pknots has O(n6) time and O(n4) space complexity. The algorithm CCJ dramatically improves the asymptotic run time for predicting complex pseudoknots (handling almost all relevant pseudoknots, while being slightly less general than Pknots), but this came at the cost of large constant factors in space and time, which strongly limited its practical application (-1/4200 bases already require 256 GB space). Results We present a CCJ-type algorithm, Knotty, that handles the same comprehensive pseudoknot class of structures as CCJ with improved space complexity of (n3+Z) - due to the applied technique of sparsification, the number of 'candidates', Z, appears to grow significantly slower than n 4 on our benchmark set (which include pseudoknotted RNAs up to 400 nt). In terms of run time over this benchmark, Knotty clearly outperforms Pknots and the original CCJ implementation, CCJ 1.0; Knotty's space consumption fundamentally improves over CCJ 1.0, being on a par with the space-economic Pknots. By comparing to CCJ 2.0, our unsparsified Knotty variant, we demonstrate the isolated effect of sparsification. Moreover, Knotty employs the state-of-the-art energy model of 'HotKnots DP09', which results in superior prediction accuracy over Pknots. Availability and implementation Our software is available at https://github.com/HosnaJabbari/Knotty. Supplementary informationSupplementary dataare available at Bioinformatics online.
AB - Motivation The computational prediction of RNA secondary structure by free energy minimization has become an important tool in RNA research. However in practice, energy minimization is mostly limited to pseudoknot-free structures or rather simple pseudoknots, not covering many biologically important structures such as kissing hairpins. Algorithms capable of predicting sufficiently complex pseudoknots (for sequences of length n) used to have extreme complexities, e.g. Pknots has O(n6) time and O(n4) space complexity. The algorithm CCJ dramatically improves the asymptotic run time for predicting complex pseudoknots (handling almost all relevant pseudoknots, while being slightly less general than Pknots), but this came at the cost of large constant factors in space and time, which strongly limited its practical application (-1/4200 bases already require 256 GB space). Results We present a CCJ-type algorithm, Knotty, that handles the same comprehensive pseudoknot class of structures as CCJ with improved space complexity of (n3+Z) - due to the applied technique of sparsification, the number of 'candidates', Z, appears to grow significantly slower than n 4 on our benchmark set (which include pseudoknotted RNAs up to 400 nt). In terms of run time over this benchmark, Knotty clearly outperforms Pknots and the original CCJ implementation, CCJ 1.0; Knotty's space consumption fundamentally improves over CCJ 1.0, being on a par with the space-economic Pknots. By comparing to CCJ 2.0, our unsparsified Knotty variant, we demonstrate the isolated effect of sparsification. Moreover, Knotty employs the state-of-the-art energy model of 'HotKnots DP09', which results in superior prediction accuracy over Pknots. Availability and implementation Our software is available at https://github.com/HosnaJabbari/Knotty. Supplementary informationSupplementary dataare available at Bioinformatics online.
UR - https://www.scopus.com/pages/publications/85056426458
U2 - 10.1093/bioinformatics/bty420
DO - 10.1093/bioinformatics/bty420
M3 - Article
C2 - 29868872
AN - SCOPUS:85056426458
SN - 1367-4803
VL - 34
SP - 3849
EP - 3856
JO - Bioinformatics
JF - Bioinformatics
IS - 22
ER -