Deciphering RNA Secondary Structure Prediction: A Probabilistic K-Rook Matching Perspective
Authors: Cheng Tan, Zhangyang Gao, Hanqun Cao, Xingran Chen, Ge Wang, Lirong Wu, Jun Xia, Jiangbin Zheng, Stan Z. Li
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments to compare our proposed RFold with state-of-the-art and commonly used approaches. Multiple experimental settings are taken into account, including standard structure prediction, generalization evaluation, large-scale benchmark evaluation, cross-family evaluation, pseudoknot prediction and inference time comparison. |
| Researcher Affiliation | Academia | 1Zhejiang University 2Westlake University 3The Chinese University of Hong Kong 4University of Michigan. |
| Pseudocode | Yes | A possible solution to relax the rigid procedure is to add a checking mechanism before the Argmax function in the inference. Specifically, if the confidence given by the Softmax is low, we do not perform Argmax and assign more base pairs. It can be implemented as the following pseudo-code: 1 y_pred = row_col_softmax(y) 2 int_one = row_col_argmax(y_pred)... |
| Open Source Code | Yes | The code is available at github.com/A4Bio/RFold. |
| Open Datasets | Yes | We use three benchmark datasets: (i) RNAStralign (Tan et al., 2017), one of the most comprehensive collections of RNA structures, is composed of 37,149 structures from 8 RNA types; (ii) Archive II (Sloma & Mathews, 2016), a widely used benchmark dataset in classical RNA folding methods, containing 3,975 RNA structures from 10 RNA types; (iii) bp RNA (Singh et al., 2019), is a large scale benchmark dataset, containing 102,318 structures from 2,588 RNA types. (iv) bp RNA-new (Sato et al., 2021), derived from Rfam 14.2 (Kalvari et al., 2021), containing sequences from 1500 new RNA families. |
| Dataset Splits | Yes | Following (Chen et al., 2019), we split the RNAStralign dataset into train, validation, and test sets by stratified sampling. Following previous works (Singh et al., 2019; Sato et al., 2021; Fu et al., 2022), we train the model in bp RNATR0 and evaluate the performance on bp RNA-TS0 by using the best model learned from bp RNA-VL0. |
| Hardware Specification | No | We compared the running time of various methods for predicting RNA secondary structures using the RNAStralign testing set with the same experimental setting and the hardware environment as in (Fu et al., 2022). Table 7 lists inference times and notes (GPU) for some methods but does not provide specific GPU models (e.g., NVIDIA A100) or CPU details. |
| Software Dependencies | No | Following the same experimental setting as (Fu et al., 2022), we train the model for 100 epochs with the Adam optimizer. Table 7 mentions 'Pytorch' for some methods but does not provide specific version numbers for Pytorch or any other software libraries used. |
| Experiment Setup | Yes | Following the same experimental setting as (Fu et al., 2022), we train the model for 100 epochs with the Adam optimizer. The learning rate is 0.001, and the batch size is 1 for sequences with different lengths. |