reproducibilityindex.ai

Crystal Diffusion Variational Autoencoder for Periodic Material Generation

Authors: Tian Xie, Xiang Fu, Octavian-Eugen Ganea, Regina Barzilay, Tommi S. Jaakkola

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We signiﬁcantly outperform past methods in three tasks: 1) reconstructing the input structure, 2) generating valid, diverse, and realistic materials, and 3) generating materials that optimize a speciﬁc property. We also provide several standard datasets and evaluation metrics for the broader machine learning community. 1
Researcher Affiliation	Academia	Computer Science and Artiﬁcial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA 02139, USA
Pseudocode	Yes	Algorithm 1 Material Generation via Annealed Langevin Dynamics
Open Source Code	Yes	Code and data are available at https://github.com/txie-93/cdvae
Open Datasets	Yes	We curated 3 datasets representing different types of material distributions. 1) Perov5 (Castelli et al., 2012a;b)... 2) Carbon-24 (Pickard, 2020)... 3) MP-20 (Jain et al., 2013)
Dataset Splits	Yes	We use a 60-20-20 random split for all of our experiments.
Hardware Specification	Yes	Time used for generating 10,000 materials on a single RTX 2080 Ti GPU.
Software Dependencies	No	The paper mentions several software components like 'pymatgen', 'Crystal NN', 'Dime Net++', 'Gem Net-d Q', and the 'Open Catalysis Project (OCP)'. However, it does not provide specific version numbers for these software dependencies, which are required for full reproducibility.
Experiment Setup	Yes	The total loss can be written as, L = LAGG + LDEC + LKL = λc Lc + λLLL + λNLN + λXLX + λALA + βLKL. We aim to keep each loss term at a similar scale. For all three datasets, we use λc = 1, λL = 10, λN = 1, λX = 10, LA = 1. We tune β between 0.01, 0.03, 0.1 for all three datasets and select the model with best validation loss. For Perov-5, MP-20, we use β = 0.01, and for Carbon-24, we use β = 0.03. For the noise levels in {σA,j}L j=1, {σX,j}L j=1, we follow Shi et al. (2021) and set L = 50. For all three datasets, we use σA,max = 5, σA,min = 0.01, σX,max = 10, σX,min = 0.01. During the training, we use an initial learning rate of 0.001 and reduce the learning rate by a factor of 0.6 if the validation loss does not improve after 30 epochs. The minimum learning rate is 0.0001. During the generation, we use ϵ = 0.0001 and run Langevin dynamics for 100 steps at each noise level.