Swallowing the Bitter Pill: Simplified Scalable Conformer Generation

Authors: Yuyang Wang, Ahmed A. A. Elhag, Navdeep Jaitly, Joshua M. Susskind, Miguel Ángel Bautista

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that scaling up the model capacity leads to large gains in generalization performance without enforcing inductive biases like rotational equivariance. MCF represents an advance in extending diffusion models to handle complex scientific problems in a conceptually simple, scalable and effective manner. (Abstract) and Experiments on recent conformer generation benchmarks show MCF surpasses strong baselines by a gap that gets larger as we scale model capacity (Introduction)
Researcher Affiliation Industry Yuyang Wang 1 Ahmed A. Elhag 1 2 Navdeep Jaitly 1 Joshua M. Susskind 1 Miguel Angel Bautista 1 1Apple 2Work was completed while A.A.E was an intern with Apple. Correspondence to: {yuyang wang4, aa elhag, njaitly, jsusskind, mbautistamartin}@apple.com.
Pseudocode Yes Algorithm 1 Training and Algorithm 2 Sampling (on page 4).
Open Source Code No The paper does not contain an explicit statement that the authors are releasing the code for their work described in this paper, nor a direct link to their source code repository.
Open Datasets Yes We use two popular datasets: GEOM-QM9 and GEOM-DRUGS (Axelrod & Gomez-Bombarelli, 2022). and Axelrod, S. and Gomez-Bombarelli, R. Geom, energyannotated molecular conformations for property prediction and molecular generation. Scientific Data, 9(1):185, 2022. (References section).
Dataset Splits Yes In our experiments, we split GEOM-QM9 and GEOM-DRUGS randomly based on molecules into train/validation/test (80%/10%/10%). At the end, for each dataset, we report the performance on 1000 test molecules. Thus, the splits contain 106586/13323/1000 and 243473/30433/1000 molecules for GEOM-QM9 and GEOM-DRUGS, respectively. (Appendix A.2)
Hardware Specification Yes For GEOM-QM9, we train models using a machine with 4 Nvidia A100 GPUs using precision BF16. For GEOM-DRUGS, we train models using precision FP32, where MCF-B is trained with 8 Nvidia A100 GPUs and MCF-L is trained with 16 Nvidia A100 GPUs. (Appendix A.2.3)
Software Dependencies No The paper mentions software components and tools like Perceiver IO and xTB but does not provide specific version numbers for these or other key software libraries like Python, PyTorch, or CUDA, which are necessary for full reproducibility.
Experiment Setup Yes An Adam W (Loshchilov & Hutter, 2017) optimizer is employed during training with a learning rate of 1e 4. Cosine learning rate decay is deployed with 30K warmup steps. We use EMA with a decay of 0.999. Models are trained for 300K steps on GEOM-QM9 and 750K steps on GEOM-DRUSG. All models use an effective batch size of 512. (Appendix A.2.1) and Table 4. Hyperparameters and settings for MCF on different datasets. (Table 4 caption).