Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

RiboFlow: Conditional De Novo RNA Co-Design via Synergistic Flow Matching

Authors: Runze Ma, Zhongyue Zhang, Zichen Wang, Chenqing Hua, Jiahua Rao, Zhuomin Zhou, Shuangjia Zheng

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments reveal that Ribo Flow not only outperforms state-of-the-art RNA design methods by a large margin but also showcases controllable capabilities for achieving high binding affinity to target ligands. Our work bridges critical gaps in controllable RNA design, offering a framework for structure-aware, data-efficient generation. 5 Experiments
Researcher Affiliation Academia 1 Shanghai Jiao Tong University 2 Yale University 3 Sun Yat-Sen University 4 Lingang Laboratory
Pseudocode Yes Algorithm 1 Ribo Flow: Inference
Open Source Code Yes We provide an anonymous Git Hub link in abstract. The complete code and data will be made public upon the paper is accepted.
Open Datasets Yes Additionally, we curate Ribo Bind, a large-scale dataset of RNA-molecule interactions, to resolve the scarcity of high-quality structural data. To establish foundational priors for RNA structural validity, we pre-train Ribo Flow on RNAsolo [2], a curated database of single-stranded RNA 3D structures.
Dataset Splits Yes To evaluate model generalization, we employ the following partitioning strategies for Ribo Bind: (1) Sequence-based Evaluation: RNA sequences are clustered at a 50% identity threshold using MMseqs2 [36]. A test set is formed from the cluster centroids, comprising 66 RNA-ligand pairs which include 20 of the most frequently observed ligands. All non-centroid sequences constitute the corresponding training set. (2) Structure-based Evaluation: Following the methodology of g RNAde [24], RNA structures are clustered using US-align [49] with a TM-score threshold of 0.45, yielding 277 distinct structural classes. These classes are then partitioned into a training set (249 classes) and a test set (28 classes) at a 9:1 ratio. (3) Few-shot Evaluation: Contains 15 RNA-ligand pairs involving ligands that appear only once in the Ribo Bind dataset, designed to evaluate low-resource generalization.
Hardware Specification Yes During the pre-training stage, a batch size of 32 is selected, utilizing 4 A100 80GB accelerator cards, completing 200K pre-training steps in nearly 20 hours. ... Notably, attempting longer sequences on an RTX 4090 GPU (24GB memory) triggers an OOM error due to the memory requirements of Rho Fold.
Software Dependencies No The paper mentions several tools and external software used (e.g., MMseqs2 [36], US-align [49], g RNAde [24], Rho Fold [35], Auto Dock Vina [41], RDKit, Alpha Fold3 [1]), but does not provide specific version numbers for these software components or for any core programming languages or libraries used in their own implementation.
Experiment Setup Yes Ribo Flow utilizes several hyperparameters in the experiments, which are crucial for the model s training and sampling processes. Therefore, we provide some key hyperparameters to facilitate the reproduction of our experiments in Table 8. The optimal hyperparameters are indicated in bold. Table 8 details specific values for Atom embedding dimension, Hidden dimension, Number of blocks, Number of heads, Learning rate, Batch size, Optimizer Adam W, and training/sampling schedules for Translations and Rotations.