Learning Over Molecular Conformer Ensembles: Datasets and Benchmarks
Authors: Yanqiao Zhu, Jeehyun Hwang, Keir Adams, Zhen Liu, Bozhao Nan, Brock Stenfors, Yuanqi Du, Jatin Chauhan, Olaf Wiest, Olexandr Isayev, Connor W. Coley, Yizhou Sun, Wei Wang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In addition, we conduct a comprehensive empirical study, which benchmarks representative 1D, 2D, and 3D MRL models, along with two strategies that explicitly incorporate conformer ensembles into 3D models. Our findings reveal that direct learning from an accessible conformer space can improve performance on a variety of tasks and models. Our experimental results confirm the potential effectiveness of incorporating conformer ensembles in MRL, highlighting the improvements over conventional single-conformation 3D networks. |
| Researcher Affiliation | Academia | t UCLA MIT CMU l Notre Dame [ Cornell |
| Pseudocode | No | The paper does not contain any explicitly labeled "Pseudocode" or "Algorithm" blocks. |
| Open Source Code | Yes | Project homepage: https://github.com/SXKDZ/MARCEL Detailed information regarding dataset access, data formatting, and loading procedures can be found at our Git Hub repository https://github.com/SXKDZ/MARCEL. |
| Open Datasets | Yes | Detailed information regarding dataset access, data formatting, and loading procedures can be found at our Git Hub repository https://github.com/SXKDZ/MARCEL. Our Drugs-75K can be accessed at https://github.com/SXKDZ/MARCEL/tree/main/datasets/Drugs. As for the conformer ensembles and descriptors that we generated, they are licensed under the Apache License. |
| Dataset Splits | Yes | Each dataset is partitioned randomly into three subsets: 70% for training, 10% for validation, and 20% for test. |
| Hardware Specification | Yes | Most of the experiments are conducted on servers equipped with Nvidia A100 GPUs, each with 40GB of memory. For memory-intensive models such as Gem Net and LEFTNet, we use servers with Nvidia H100 GPUs, each with 80GB memory. |
| Software Dependencies | No | The paper mentions using "Py Torch [60] and Py Torch-Geometric [61] to implement all deep learning models" but does not specify version numbers for these software components. |
| Experiment Setup | Yes | Each model is trained over 2,000 epochs using the Adam optimizer [55] with early stopping triggered if there is no improvement on the training loss over 200 epochs. ... To ensure a fair comparison, the hidden dimension size is uniformly set to 128 for all models. |