Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Neural representation and generation for RNA secondary structures

Authors: Zichao Yan, William L. Hamilton, Mathieu Blanchette

ICLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on our newly proposed benchmarks highlight how the hierarchical approach allows more effective representation and generation of complex RNA structures, while also highlighting important challenges for future work in the area.
Researcher Affiliation Academia Zichao Yan School of Computer Science Mc Gill University, Mila EMAIL William L. Hamilton School of Computer Science Mc Gill University, Mila EMAIL Mathieu Blanchette School of Computer Science Mc Gill University EMAIL
Pseudocode Yes Algorithm 1: DFS decode RNA secondary structure
Open Source Code No No explicit statement about releasing code or a link to a repository for their specific implementation is found.
Open Datasets Yes The unlabeled dataset is obtained from the complete human transcriptome which is downloaded from the Ensembl database (Aken et al. (2016); version GRCh38). The labeled dataset is pulled from a previous study on sequence and structural binding preference of RNA binding proteins (RBP), using an in vitro selection protocol called RNAcompete-S (Cook et al., 2017)
Dataset Splits Yes We randomly split the dataset into a training set that contains 1,149,859 RNAs, and 20,000 held-out RNAs for evaluating decoding from the posterior distribution. Then, 80% of all RNAs are randomly selected to the train split, and the rest goes to the test split. The KL annealing schedule was chosen using a validation set of 1,280 RNAs.
Hardware Specification No We also thank Compute Canada for providing the computational resources. This is a general statement about HPC resources, not specific hardware (e.g., GPU/CPU models, memory).
Software Dependencies No The paper mentions 'AMSGrad (Reddi et al., 2018)' as an optimizer, but does not provide specific version numbers for any key software components or libraries (e.g., PyTorch, TensorFlow).
Experiment Setup Yes Relevant hyperparameters can be found in Table S1. Table S1 lists: latent dimensionality 128, hidden units 512, G-MPNN iterations 5, T-GRU iterations 10, learning rate 1e-3, batch size 32, dropout ratio 0.2, M TI 300, S TI (hierarchical decoder) 100, S TI (linearized decoder) 1000. Also mentions training for 20 epochs with 5 epochs of warm-up for KL annealing.