RNAFlow: RNA Structure & Sequence Design via Inverse Folding-Based Flow Matching
Authors: Divya Nori, Wengong Jin
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on the task of protein-conditioned RNA structure and sequence generation. RNAFlow outperforms a standard sequence-only approach and a recent diffusion model (Morehead et al., 2023) for nucleic acid sequence-structure generation in terms of native sequence recovery, RMSD, and l DDT. Additionally, we show that RNAFlow can be used in the motif-scaffolding setting to generate plausible RNA aptamers for G-protein-coupled receptor kinase 2 (GRK2), a target with known sequence motif for GRK2 binding. |
| Researcher Affiliation | Academia | 1Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA 2Broad Institute of MIT and Harvard, Cambridge, MA, USA. |
| Pseudocode | Yes | Algorithm 1 RNAFlow: Train; Algorithm 2 RNAFlow-Traj: Inference |
| Open Source Code | Yes | Code is available at https://github.com/divnori/rnaflow. |
| Open Datasets | Yes | Protein-RNA complexes from the PDBBind dataset (2020 version) were used for training and evaluation (Liu et al., 2017). [...] Traj-to-Seq was trained separately on RNASolo (Adamczyk et al., 2022), a dataset of RNA sequences and structures where many of the sequences are associated with multiple structure conformers. |
| Dataset Splits | Yes | The first [split] accounts for RF2NA pre-training all examples from complexes in the RF2NA validation or test sets were assigned to the test split, and remaining examples were randomly split into training and validation in a 9:1 ratio. [...] In the sequence similarity split, there are 1015 complexes in train, 105 in validation, and 72 in test. [...] In the RF2NA split, there are 1059 complexes in train, 117 in validation, and 16 in test. |
| Hardware Specification | Yes | which takes a few hours on an NVIDIA A5000-24GB GPU. [...] which takes one day on an NVIDIA A5000-24GB GPU |
| Software Dependencies | No | The paper mentions several tools and algorithms like 'Adam optimizer', 'RF2NA', 'Gumbel Softmax estimator', 'Kabsch align', and 'CD-HIT' and cites their original papers, but it does not specify exact version numbers for any software libraries or dependencies used in the implementation. |
| Experiment Setup | Yes | Noise-to-Seq encoder and decoder GVP layers use a node scalar feature dimension of 128, node vector feature dimension of 16, edge scalar feature dimension of 32, and edge vector feature dimension of 1. On both splits, the model was pre-trained for 100 epochs using an Adam optimizer with a learning rate of 0.001. [...] We fine-tune RNAFlow for 100 epochs which takes one day on an NVIDIA A5000-24GB GPU, and we use the Adam optimizer with a learning rate of 0.001. |