reproducibilityindex.ai

RNAFlow: RNA Structure & Sequence Design via Inverse Folding-Based Flow Matching

Authors: Divya Nori, Wengong Jin

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on the task of protein-conditioned RNA structure and sequence generation. RNAFlow outperforms a standard sequence-only approach and a recent diffusion model (Morehead et al., 2023) for nucleic acid sequence-structure generation in terms of native sequence recovery, RMSD, and l DDT. Additionally, we show that RNAFlow can be used in the motif-scaffolding setting to generate plausible RNA aptamers for G-protein-coupled receptor kinase 2 (GRK2), a target with known sequence motif for GRK2 binding.
Researcher Affiliation	Academia	1Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA 2Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Pseudocode	Yes	Algorithm 1 RNAFlow: Train; Algorithm 2 RNAFlow-Traj: Inference
Open Source Code	Yes	Code is available at https://github.com/divnori/rnaflow.
Open Datasets	Yes	Protein-RNA complexes from the PDBBind dataset (2020 version) were used for training and evaluation (Liu et al., 2017). [...] Traj-to-Seq was trained separately on RNASolo (Adamczyk et al., 2022), a dataset of RNA sequences and structures where many of the sequences are associated with multiple structure conformers.
Dataset Splits	Yes	The first [split] accounts for RF2NA pre-training all examples from complexes in the RF2NA validation or test sets were assigned to the test split, and remaining examples were randomly split into training and validation in a 9:1 ratio. [...] In the sequence similarity split, there are 1015 complexes in train, 105 in validation, and 72 in test. [...] In the RF2NA split, there are 1059 complexes in train, 117 in validation, and 16 in test.
Hardware Specification	Yes	which takes a few hours on an NVIDIA A5000-24GB GPU. [...] which takes one day on an NVIDIA A5000-24GB GPU
Software Dependencies	No	The paper mentions several tools and algorithms like 'Adam optimizer', 'RF2NA', 'Gumbel Softmax estimator', 'Kabsch align', and 'CD-HIT' and cites their original papers, but it does not specify exact version numbers for any software libraries or dependencies used in the implementation.
Experiment Setup	Yes	Noise-to-Seq encoder and decoder GVP layers use a node scalar feature dimension of 128, node vector feature dimension of 16, edge scalar feature dimension of 32, and edge vector feature dimension of 1. On both splits, the model was pre-trained for 100 epochs using an Adam optimizer with a learning rate of 0.001. [...] We fine-tune RNAFlow for 100 epochs which takes one day on an NVIDIA A5000-24GB GPU, and we use the Adam optimizer with a learning rate of 0.001.