3DLinker: An E(3) Equivariant Variational Autoencoder for Molecular Linker Design

Authors: Yinan Huang, Xingang Peng, Jianzhu Ma, Muhan Zhang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate our model, we choose a subset of ZINC (Sterling & Irwin, 2015). For each molecule, we perform 20 times of MMFF force field optimization using RDKit (Landrum) and choose the one with the lowest energy as the ground truth. Following the same procedure from (Hussain & Rea, 2010), the (fragments, linker) pairs are produced by enumerating all double cuts of acyclic single bonds that are not within any functional groups. In total, we obtain 365,749 (fragments, linker, coordinates) triplets and randomly split them into training (365,039), validation (351) and test (358). Evaluation. We evaluate the generated molecules for multiple 2D (graph) and 3D (coordinates) metrics, including the standard ones such as validity, uniqueness and novelty (Brown et al., 2019).
Researcher Affiliation Collaboration 1Beijing Institute for General Artificial Intelligence 2Tsinghua University 3Institute for Artificial Intelligence, Peking University.
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes We implement 3DLinker based on the released code of De Linker (https://github.com/fimrie/De Linker). Our code and data are available at https://github.com/Graph PKU/3DLinker.
Open Datasets Yes Dataset. To evaluate our model, we choose a subset of ZINC (Sterling & Irwin, 2015). ... In total, we obtain 365,749 (fragments, linker, coordinates) triplets and randomly split them into training (365,039), validation (351) and test (358).
Dataset Splits Yes In total, we obtain 365,749 (fragments, linker, coordinates) triplets and randomly split them into training (365,039), validation (351) and test (358).
Hardware Specification No The paper does not explicitly describe the hardware used for the experiments. It mentions 'computational efficiency' in relation to not including hydrogen, but no specific hardware details are provided.
Software Dependencies No The paper mentions RDKit (Landrum) for MMFF force field optimization and for computing QED scores, but no version number is provided for RDKit or any other software dependency.
Experiment Setup Yes We trained 3DLinker for 20 epochs using Adam optimizer with a learning rate 0.006, batch size 48 and KL trade-off β = 0.6. Training details for other baselines are included in Appendix C. Each model generates 250 samples per two fragments, which leads to in total 250 358 = 89500 samples. We conduct such generations three times independently for uncertainty estimation of metrics in the main table 1.