Scalable Diffusion for Materials Generation

Authors: Sherry Yang, KwangHwan Cho, Amil Merchant, Pieter Abbeel, Dale Schuurmans, Igor Mordatch, Ekin Dogus Cubuk

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first evaluate Uni Mat on a set of proxy metrics proposed by Xie et al. (2021), and show that Uni Mat generally works better than the previous state-of-the-art graph based approach and a recent language model (Flam-Shepherd & Aspuru-Guzik, 2023) and diffusion model (Pakornchote et al., 2023) baseline. However, we are ultimately interested in whether the generated materials are physically valid and can be synthesized in a laboratory (e.g., low-energy materials). We found proxy metrics based on learning a separate energy network either saturate or fall short in evaluating generated materials reliably under the context of material discovery (i.e., generating materials that have not been seen by the energy prediction network). In answering this question, we run DFT relaxations (Hafner, 2008) to compute the formation energy of the generated materials, which is more widely accepted in material science than learned proxy metrics in Bartel et al. (2020). We then use per-composition formation energy and stability with respect to convex hull through decomposition energy as more reliable metrics for evaluating generative models for materials. Uni Mat drastically outperforms previous state-of-the-art according to these DFT based metrics. Lastly, we scale Uni Mat to train on all experimentally verified stable materials as well as additional stable / semi-stable materials found through search and substitution (over 2 million structures in total). We show that predicting material structures conditioned on element type can generalize (in a zero-shot manner) to predicting more difficult structures that are not a neighboring structure to the training set, achieving better efficiency than the predominant random structure search. This allows for the possibility of discovering new materials with desired properties effectively.
Researcher Affiliation Collaboration Sherry Yang,1,2 Kwang Hwan Cho2 Amil Merchant1 Pieter Abbeel2 Dale Schuurmans1,3 Igor Mordatch1 Ekin Dogus Cubuk1 1Google Deep Mind 2UC Berkeley 3University of Alberta
Pseudocode No The paper describes the model architecture and procedures in text and refers to appendices for details, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes See website at https://unifiedmaterials.github.io.
Open Datasets Yes We begin the evaluation following the same setup as CDVAE Xie et al. (2021) using Perov-5, Carbon-24, and MP-20 materials datasets. ... Specifically, we downloaded the full Materials Project database (Jain et al., 2013) from July 2021, and used this to form the convex hull.
Dataset Splits Yes We begin the evaluation following the same setup as CDVAE Xie et al. (2021) using Perov-5, Carbon-24, and MP-20 materials datasets.
Hardware Specification Yes Training hardware 32 TPU-v4 chips
Software Dependencies No The paper mentions software like VASP, PBE, PAW, pymatgen, and atomate (Appendix B) but does not provide specific version numbers for these software components, which is required for reproducibility.
Experiment Setup Yes The hyperparamters in training the Uni Mat diffusion model are summarized in Table 4. (Appendix A)