Deciphering RNA Secondary Structure Prediction: A Probabilistic K-Rook Matching Perspective

Authors: Cheng Tan, Zhangyang Gao, Hanqun Cao, Xingran Chen, Ge Wang, Lirong Wu, Jun Xia, Jiangbin Zheng, Stan Z. Li

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments to compare our proposed RFold with state-of-the-art and commonly used approaches. Multiple experimental settings are taken into account, including standard structure prediction, generalization evaluation, large-scale benchmark evaluation, cross-family evaluation, pseudoknot prediction and inference time comparison.
Researcher Affiliation Academia 1Zhejiang University 2Westlake University 3The Chinese University of Hong Kong 4University of Michigan.
Pseudocode Yes A possible solution to relax the rigid procedure is to add a checking mechanism before the Argmax function in the inference. Specifically, if the confidence given by the Softmax is low, we do not perform Argmax and assign more base pairs. It can be implemented as the following pseudo-code: 1 y_pred = row_col_softmax(y) 2 int_one = row_col_argmax(y_pred)...
Open Source Code Yes The code is available at github.com/A4Bio/RFold.
Open Datasets Yes We use three benchmark datasets: (i) RNAStralign (Tan et al., 2017), one of the most comprehensive collections of RNA structures, is composed of 37,149 structures from 8 RNA types; (ii) Archive II (Sloma & Mathews, 2016), a widely used benchmark dataset in classical RNA folding methods, containing 3,975 RNA structures from 10 RNA types; (iii) bp RNA (Singh et al., 2019), is a large scale benchmark dataset, containing 102,318 structures from 2,588 RNA types. (iv) bp RNA-new (Sato et al., 2021), derived from Rfam 14.2 (Kalvari et al., 2021), containing sequences from 1500 new RNA families.
Dataset Splits Yes Following (Chen et al., 2019), we split the RNAStralign dataset into train, validation, and test sets by stratified sampling. Following previous works (Singh et al., 2019; Sato et al., 2021; Fu et al., 2022), we train the model in bp RNATR0 and evaluate the performance on bp RNA-TS0 by using the best model learned from bp RNA-VL0.
Hardware Specification No We compared the running time of various methods for predicting RNA secondary structures using the RNAStralign testing set with the same experimental setting and the hardware environment as in (Fu et al., 2022). Table 7 lists inference times and notes (GPU) for some methods but does not provide specific GPU models (e.g., NVIDIA A100) or CPU details.
Software Dependencies No Following the same experimental setting as (Fu et al., 2022), we train the model for 100 epochs with the Adam optimizer. Table 7 mentions 'Pytorch' for some methods but does not provide specific version numbers for Pytorch or any other software libraries used.
Experiment Setup Yes Following the same experimental setting as (Fu et al., 2022), we train the model for 100 epochs with the Adam optimizer. The learning rate is 0.001, and the batch size is 1 for sequences with different lengths.