Equivariant Diffusion for Crystal Structure Prediction

Authors: Peijia Lin, Pin Chen, Rui Jiao, Qing Mo, Cen Jianhuan, Wenbing Huang, Yang Liu, Dan Huang, Yutong Lu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments indicate that Equi CSP significantly surpasses existing models in terms of generating accurate structures and demonstrates faster convergence during the training process.
Researcher Affiliation Academia 1School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China 2National Supercomputer Center in Guangzhou, China 3Dept. of Comp. Sci. Tech., Institute for AI, Tsinghua University, Beijing, China 4Institute for AIR, Tsinghua University, Beijing, China 5Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 6Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China.
Pseudocode Yes Algorithm 1 Training Procedure of Equi CSP; Algorithm 2 Sampling Procedure of Equi CSP
Open Source Code Yes Code is available at https: //github.com/Emperor Jia/Equi CSP.
Open Datasets Yes Experiments are carried out on three datasets, each varying in complexity. The Perov-5 dataset (Castelli et al., 2012a;b) comprises 18,928 perovskite materials... The dataset MP-20 comprises 45,231 stable inorganic materials curated from the Material Projects(Jain et al., 2013)... MPTS-52 represents a more challenging extension of MP20... Carbon-24 (Pickard, 2020) encompasses 10,153 carbon materials...
Dataset Splits Yes For datasets such as Perov-5, and MP-20, we adhere to a 60-20-20 split for training, validation, and testing, respectively, aligning with the methodology of Jiao et al. (2023). Conversely, for MPTS-52, we allocate 27,380 entries for training, 5,000 for validation, and 8,096 for testing, arranged in chronological order.
Hardware Specification Yes All models are trained on one Nvidia A800 GPU.
Software Dependencies No The paper mentions using 'pymatgen’s Structure Matcher class metrics' and 'the Sci Py library' but does not provide specific version numbers for these or other software dependencies, which are necessary for full reproducibility.
Experiment Setup Yes For our Equi CSP, we employ a 4-layer setting with 256 hidden states for Perov-5 and a 6-layer setting with 512 hidden states for other datasets. The dimension of the Fourier embedding is set to k = 256. We utilize the cosine scheduler with s = 0.008 to regulate the variance of the DDPM process on Ct, and an exponential scheduler with σ1 = 0.005, σT = 0.5 to control the noise scale of the score matching process on Ft. The diffusion step is set to T = 1000. Our model undergoes training for 3500, 4000, 1000, and 1000 epochs respectively for Perov-5, Carbon-24, MP-20, and MPTS-52 using the same optimizer and learning rate scheduler as CDVAE. For Langevin dynamics step size γ, we apply values of γ = 5 10 7 for Perov-5, γ = 5 10 6 for MP-20, γ = 1 10 5 for MPTS-52; while for ab initio generation in Carbon-24 case we use γ = 1 10 5.