Neural representation and generation for RNA secondary structures

Authors: Zichao Yan, William L. Hamilton, Mathieu Blanchette

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on our newly proposed benchmarks highlight how the hierarchical approach allows more effective representation and generation of complex RNA structures, while also highlighting important challenges for future work in the area.
Researcher Affiliation Academia Zichao Yan School of Computer Science Mc Gill University, Mila zichao.yan@mail.mcgill.ca William L. Hamilton School of Computer Science Mc Gill University, Mila wlh@cs.mcgill.ca Mathieu Blanchette School of Computer Science Mc Gill University blanchem@cs.mcgill.ca
Pseudocode Yes Algorithm 1: DFS decode RNA secondary structure
Open Source Code No No explicit statement about releasing code or a link to a repository for their specific implementation is found.
Open Datasets Yes The unlabeled dataset is obtained from the complete human transcriptome which is downloaded from the Ensembl database (Aken et al. (2016); version GRCh38). The labeled dataset is pulled from a previous study on sequence and structural binding preference of RNA binding proteins (RBP), using an in vitro selection protocol called RNAcompete-S (Cook et al., 2017)
Dataset Splits Yes We randomly split the dataset into a training set that contains 1,149,859 RNAs, and 20,000 held-out RNAs for evaluating decoding from the posterior distribution. Then, 80% of all RNAs are randomly selected to the train split, and the rest goes to the test split. The KL annealing schedule was chosen using a validation set of 1,280 RNAs.
Hardware Specification No We also thank Compute Canada for providing the computational resources. This is a general statement about HPC resources, not specific hardware (e.g., GPU/CPU models, memory).
Software Dependencies No The paper mentions 'AMSGrad (Reddi et al., 2018)' as an optimizer, but does not provide specific version numbers for any key software components or libraries (e.g., PyTorch, TensorFlow).
Experiment Setup Yes Relevant hyperparameters can be found in Table S1. Table S1 lists: latent dimensionality 128, hidden units 512, G-MPNN iterations 5, T-GRU iterations 10, learning rate 1e-3, batch size 32, dropout ratio 0.2, M TI 300, S TI (hierarchical decoder) 100, S TI (linearized decoder) 1000. Also mentions training for 20 epochs with 5 epochs of warm-up for KL annealing.