TY - GEN
T1 - Sparsification enables predicting kissing hairpin pseudoknot structures of long RNAs in practice
AU - Jabbari, Hosna
AU - Wark, Ian
AU - Montemagno, Carlo
AU - Will, Sebastian
N1 - Publisher Copyright:
© Hosna Jabbari, Ian Wark, Carlo Montemagno, and Sebastian Will.
PY - 2017/8/1
Y1 - 2017/8/1
N2 - While computational RNA secondary structure prediction is an important tool in RNA research, it is still fundamentally limited to pseudoknot-free structures (or at best very simple pseudoknots) in practice. Here, we make the prediction of complex pseudoknots - including kissing hairpin structures - practically applicable by reducing the originally high space consumption. For this aim, we apply the technique of sparsification and other space-saving modifications to the recurrences of the pseudoknot prediction algorithm by Chen, Condon and Jabbari (CCJ algorithm). Thus, the theoretical space complexity of free energy minimization is reduced to -(n3 + Z), in the sequence length n and the number of non-optimally decomposable fragments ("candidates") Z. The sparsified CCJ algorithm, sparseCCJ, is presented in detail. Moreover, we provide and compare three generations of CCJ implementations, which continuously improve the space requirements: the original CCJ implementation, our first modified implementation, and our final sparsified implementation. The two latest implementations implement the established HotKnots DP09 energy model. In our experiments, using 244GB of RAM, the original CCJ implementation failed to handle sequences longer than 195 bases; sparseCCJ handles our pseudoknot data set (up to about length 400 bases) in this space limit. All three CCJ implementations are available at https://github.com/HosnaJabbari/CCJ.
AB - While computational RNA secondary structure prediction is an important tool in RNA research, it is still fundamentally limited to pseudoknot-free structures (or at best very simple pseudoknots) in practice. Here, we make the prediction of complex pseudoknots - including kissing hairpin structures - practically applicable by reducing the originally high space consumption. For this aim, we apply the technique of sparsification and other space-saving modifications to the recurrences of the pseudoknot prediction algorithm by Chen, Condon and Jabbari (CCJ algorithm). Thus, the theoretical space complexity of free energy minimization is reduced to -(n3 + Z), in the sequence length n and the number of non-optimally decomposable fragments ("candidates") Z. The sparsified CCJ algorithm, sparseCCJ, is presented in detail. Moreover, we provide and compare three generations of CCJ implementations, which continuously improve the space requirements: the original CCJ implementation, our first modified implementation, and our final sparsified implementation. The two latest implementations implement the established HotKnots DP09 energy model. In our experiments, using 244GB of RAM, the original CCJ implementation failed to handle sequences longer than 195 bases; sparseCCJ handles our pseudoknot data set (up to about length 400 bases) in this space limit. All three CCJ implementations are available at https://github.com/HosnaJabbari/CCJ.
KW - Pseudoknots
KW - RNA
KW - Secondary structure prediction
KW - Space efficiency
KW - Sparsification
UR - https://www.scopus.com/pages/publications/85028777576
U2 - 10.4230/LIPIcs.WABI.2017.12
DO - 10.4230/LIPIcs.WABI.2017.12
M3 - Conference contribution
AN - SCOPUS:85028777576
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 17th International Workshop on Algorithms in Bioinformatics, WABI 2017
A2 - Reinert, Knut
A2 - Schwartz, Russell
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 17th International Workshop on Algorithms in Bioinformatics, WABI 2017
Y2 - 21 August 2017 through 23 August 2017
ER -