Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

gRNAde: Geometric Deep Learning for 3D RNA inverse design

Authors: Chaitanya Joshi, Arian Jamasb, Ramon Viñas, Charles Harris, Simon Mathis, Alex Morehead, Rishabh Anand, Pietro Lio

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On a single-state fixed backbone re-design benchmark of 14 RNA structures from the PDB identified by Das et al. (2010), g RNAde obtains higher native sequence recovery rates (56% on average) compared to Rosetta (45% on average), taking under a second to produce designs compared to the reported hours for Rosetta. We further demonstrate the utility of g RNAde on a new benchmark of multi-state design for structurally flexible RNAs, as well as zero-shot ranking of mutational fitness landscapes in a retrospective analysis of a recent ribozyme. Experimental wet lab validation on 10 different structured RNA backbones finds that g RNAde has a success rate of 50% at designing pseudoknotted RNA structures, a significant advance over 35% for Rosetta.
Researcher Affiliation Collaboration 1University of Cambridge, UK, 2Prescient Design, Genentech, Roche, 3EPFL, Switzerland, 3University of Missouri, USA 5National University of Singapore
Pseudocode Yes Listing 1: Pseudocode for multi-state GNN encoder layer.
Open Source Code Yes Open source code and tutorials are available at: github.com/chaitjo/geometric-rna-design
Open Datasets Yes We create a machine learning-ready dataset for RNA inverse design using RNASolo (Adamczyk et al., 2022), a novel repository of RNA 3D structures extracted from solo RNAs, protein-RNA complexes, and DNA-RNA hybrids in the PDB.
Dataset Splits Yes After clustering, we split the RNAs into training ( 4000 samples), validation and test sets (100 samples each) to evaluate two different design scenarios:
Hardware Specification Yes sampling 100+ designs in 1 second for an RNA of 60 nucleotides on an A100 GPU (<10 seconds on CPU)... approximate peak GPU usage for max. number of states = 1: 12GB, 3: 28GB, 5: 50GB on a single A100 with at most 3000 total nodes in a mini-batch). This research was partially supported by Google TPU Research Cloud and Cambridge Dawn Supercomputer Pioneer Project compute grants.
Software Dependencies No The paper mentions using "PyTorch Geometric (Fey & Lenssen, 2019)" but does not provide specific version numbers for this or any other software dependencies, which is required for reproducibility.
Experiment Setup Yes All models use 4 encoder and 4 decoder GVP-GNN layers, with 128 scalar/16 vector node features, 64 scalar/4 vector edge features, and drop out probability 0.5, resulting in 2,147,944 trainable parameters. All models are trained for a maximum of 50 epochs using the Adam optimiser with an initial learning rate of 0.0001, which is reduced by a factor 0.9 when validation performance plateaus with patience of 5 epochs.