UniIF: Unified Molecule Inverse Folding
Authors: Zhangyang Gao, Jue Wang, Cheng Tan, Lirong Wu, Yufei Huang, Siyuan Li, Zhirui Ye, Stan Z. Li
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through comprehensive evaluations across various tasks such as protein design, RNA design, and material design, we demonstrate that our proposed method surpasses state-of-the-art methods on all tasks. |
| Researcher Affiliation | Academia | Zhangyang Gao 1,2, , Jue Wang 1,2, , Cheng Tan 1,2, , Lirong Wu 2, Yufei Huang 2, Siyuan Li 2, Zhirui Ye 2, Stan Z. Li 2, 1 Zhejiang University 2 Westlake University |
| Pseudocode | No | The paper describes its model architecture and components using text and mathematical equations, but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks. |
| Open Source Code | No | The code will be released upon acceptance. |
| Open Datasets | Yes | We evaluate of Uni IF on the CATH4.3 dataset [30] following prior works [11, 8]. For structural time-split evaluation, we use the CASP15 dataset [11]... For sequence time-split evaluation, we use the Novel Pro dataset [8]... We conduct experiments RNA on the dataset collected by RDesign [34]... We evaluated Uni IF on the CHILI-3K dataset [6]... |
| Dataset Splits | Yes | The dataset is split by the CATH topology classification code, yielding 16,631 training, 1,516 validation, and 1,864 testing samples. 2218 RNA tertiary structures, which are divided into training (1774 structures), testing (223 structures), and validation (221 structures) sets based on their structural similarity. the dataset is randomly split into training (80%), validation (10%), and testing (10%) sets. |
| Hardware Specification | Yes | All experiments are conducted on an NVIDIA A100 with 80G memory. The longest training time is about 1 day. |
| Software Dependencies | No | The paper mentions using the Adam optimizer and various network architectures (MLP, GNN, Transformer) but does not provide specific software dependency versions (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | Uni IF consists of 10 layers of Block GAT with a hidden dimension of 128. It is trained using the Adam optimizer with a learning rate of 1e-3 and a batch size of 8 for 50 epochs. Experiments are repeated three times with different seeds, using early stopping with a patience of 50 epochs, and trained up to 1000 epochs. randomly drop out the nodes/edges with a probability of p to prevent overfitting. The best performance is achieved when p = 0.05. |