Crystal Diffusion Variational Autoencoder for Periodic Material Generation
Authors: Tian Xie, Xiang Fu, Octavian-Eugen Ganea, Regina Barzilay, Tommi S. Jaakkola
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We significantly outperform past methods in three tasks: 1) reconstructing the input structure, 2) generating valid, diverse, and realistic materials, and 3) generating materials that optimize a specific property. We also provide several standard datasets and evaluation metrics for the broader machine learning community. 1 |
| Researcher Affiliation | Academia | Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA 02139, USA |
| Pseudocode | Yes | Algorithm 1 Material Generation via Annealed Langevin Dynamics |
| Open Source Code | Yes | Code and data are available at https://github.com/txie-93/cdvae |
| Open Datasets | Yes | We curated 3 datasets representing different types of material distributions. 1) Perov5 (Castelli et al., 2012a;b)... 2) Carbon-24 (Pickard, 2020)... 3) MP-20 (Jain et al., 2013) |
| Dataset Splits | Yes | We use a 60-20-20 random split for all of our experiments. |
| Hardware Specification | Yes | Time used for generating 10,000 materials on a single RTX 2080 Ti GPU. |
| Software Dependencies | No | The paper mentions several software components like 'pymatgen', 'Crystal NN', 'Dime Net++', 'Gem Net-d Q', and the 'Open Catalysis Project (OCP)'. However, it does not provide specific version numbers for these software dependencies, which are required for full reproducibility. |
| Experiment Setup | Yes | The total loss can be written as, L = LAGG + LDEC + LKL = λc Lc + λLLL + λNLN + λXLX + λALA + βLKL. We aim to keep each loss term at a similar scale. For all three datasets, we use λc = 1, λL = 10, λN = 1, λX = 10, LA = 1. We tune β between 0.01, 0.03, 0.1 for all three datasets and select the model with best validation loss. For Perov-5, MP-20, we use β = 0.01, and for Carbon-24, we use β = 0.03. For the noise levels in {σA,j}L j=1, {σX,j}L j=1, we follow Shi et al. (2021) and set L = 50. For all three datasets, we use σA,max = 5, σA,min = 0.01, σX,max = 10, σX,min = 0.01. During the training, we use an initial learning rate of 0.001 and reduce the learning rate by a factor of 0.6 if the validation loss does not improve after 30 epochs. The minimum learning rate is 0.0001. During the generation, we use ϵ = 0.0001 and run Langevin dynamics for 100 steps at each noise level. |