3DLinker: An E(3) Equivariant Variational Autoencoder for Molecular Linker Design
Authors: Yinan Huang, Xingang Peng, Jianzhu Ma, Muhan Zhang
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate our model, we choose a subset of ZINC (Sterling & Irwin, 2015). For each molecule, we perform 20 times of MMFF force field optimization using RDKit (Landrum) and choose the one with the lowest energy as the ground truth. Following the same procedure from (Hussain & Rea, 2010), the (fragments, linker) pairs are produced by enumerating all double cuts of acyclic single bonds that are not within any functional groups. In total, we obtain 365,749 (fragments, linker, coordinates) triplets and randomly split them into training (365,039), validation (351) and test (358). Evaluation. We evaluate the generated molecules for multiple 2D (graph) and 3D (coordinates) metrics, including the standard ones such as validity, uniqueness and novelty (Brown et al., 2019). |
| Researcher Affiliation | Collaboration | 1Beijing Institute for General Artificial Intelligence 2Tsinghua University 3Institute for Artificial Intelligence, Peking University. |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We implement 3DLinker based on the released code of De Linker (https://github.com/fimrie/De Linker). Our code and data are available at https://github.com/Graph PKU/3DLinker. |
| Open Datasets | Yes | Dataset. To evaluate our model, we choose a subset of ZINC (Sterling & Irwin, 2015). ... In total, we obtain 365,749 (fragments, linker, coordinates) triplets and randomly split them into training (365,039), validation (351) and test (358). |
| Dataset Splits | Yes | In total, we obtain 365,749 (fragments, linker, coordinates) triplets and randomly split them into training (365,039), validation (351) and test (358). |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for the experiments. It mentions 'computational efficiency' in relation to not including hydrogen, but no specific hardware details are provided. |
| Software Dependencies | No | The paper mentions RDKit (Landrum) for MMFF force field optimization and for computing QED scores, but no version number is provided for RDKit or any other software dependency. |
| Experiment Setup | Yes | We trained 3DLinker for 20 epochs using Adam optimizer with a learning rate 0.006, batch size 48 and KL trade-off β = 0.6. Training details for other baselines are included in Appendix C. Each model generates 250 samples per two fragments, which leads to in total 250 358 = 89500 samples. We conduct such generations three times independently for uncertainty estimation of metrics in the main table 1. |