RDesign: Hierarchical Data-efficient Representation Learning for Tertiary Structure-based RNA Design

Authors: Cheng Tan, Yijie Zhang, Zhangyang Gao, Bozhen Hu, Siyuan Li, Zicheng Liu, Stan Z. Li

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate RDesign on the tertiary structure-based RNA design task by comparing it with four categories of baseline models:
Researcher Affiliation Academia 1Zhejiang University, Hangzhou, China 3Mc Gill University, Montr eal, Qu ebec, Canada 2AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, China {tancheng,gaozhangyang}@westlake.edu.cn; yj.zhang@mail.mcgill.ca
Pseudocode No The paper describes algorithms and pipelines but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes The source code and benchmark dataset are available at github.com/A4Bio/RDesign.
Open Datasets Yes We train and assess performance on our proposed RNA structure benchmark dataset which aggregates and cleans data from RNAsolo (Adamczyk et al., 2022b) and the Protein Data Bank (PDB) (Bank, 1971; Berman ets al., 2000). ... To test the generalization ability, we apply pre-trained models to the Rfam (Gardner et al., 2009; Nawrocki et al., 2015) and RNA-Puzzles (Miao et al., 2020) datasets that contain non-overlapping structures.
Dataset Splits Yes The benchmark dataset consists of 2218 RNA tertiary structures, which are divided into training (1774 structures), testing (223 structures), and validation (221 structures) sets based on their structural similarity.
Hardware Specification Yes We ran the models on Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz CPU and NVIDIA A100 GPU.
Software Dependencies Yes The model was implemented based on the standard Py Torch Geometric (Fey & Lenssen, 2019) library using the Py Torch 1.11.0 library.
Experiment Setup Yes We trained the model for 200 epochs using the Adam optimizer with a learning rate of 0.001. The batch size was set as 64. The model s encoder and decoder each had three layers. With a dropout rate of 0.1, it considered 30 nearest neighbors and a vocabulary size matching RNA s four alphabets.