RDesign: Hierarchical Data-efficient Representation Learning for Tertiary Structure-based RNA Design
Authors: Cheng Tan, Yijie Zhang, Zhangyang Gao, Bozhen Hu, Siyuan Li, Zicheng Liu, Stan Z. Li
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate RDesign on the tertiary structure-based RNA design task by comparing it with four categories of baseline models: |
| Researcher Affiliation | Academia | 1Zhejiang University, Hangzhou, China 3Mc Gill University, Montr eal, Qu ebec, Canada 2AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, China {tancheng,gaozhangyang}@westlake.edu.cn; yj.zhang@mail.mcgill.ca |
| Pseudocode | No | The paper describes algorithms and pipelines but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code and benchmark dataset are available at github.com/A4Bio/RDesign. |
| Open Datasets | Yes | We train and assess performance on our proposed RNA structure benchmark dataset which aggregates and cleans data from RNAsolo (Adamczyk et al., 2022b) and the Protein Data Bank (PDB) (Bank, 1971; Berman ets al., 2000). ... To test the generalization ability, we apply pre-trained models to the Rfam (Gardner et al., 2009; Nawrocki et al., 2015) and RNA-Puzzles (Miao et al., 2020) datasets that contain non-overlapping structures. |
| Dataset Splits | Yes | The benchmark dataset consists of 2218 RNA tertiary structures, which are divided into training (1774 structures), testing (223 structures), and validation (221 structures) sets based on their structural similarity. |
| Hardware Specification | Yes | We ran the models on Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz CPU and NVIDIA A100 GPU. |
| Software Dependencies | Yes | The model was implemented based on the standard Py Torch Geometric (Fey & Lenssen, 2019) library using the Py Torch 1.11.0 library. |
| Experiment Setup | Yes | We trained the model for 200 epochs using the Adam optimizer with a learning rate of 0.001. The batch size was set as 64. The model s encoder and decoder each had three layers. With a dropout rate of 0.1, it considered 30 nearest neighbors and a vocabulary size matching RNA s four alphabets. |