Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Space Group Equivariant Crystal Diffusion

Authors: Rees Chang, Angela Pak, Alex Guerra, Ni Zhan, Nick Richardson, Elif Ertekin, Ryan P. Adams

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental SGEqui Diff achieves state-of-the-art performance on standard benchmark datasets as assessed by quantitative proxy metrics and quantum mechanical calculations. We evaluated SGEqui Diff on two benchmark datasets: MP20 [100], containing 45,231 experimentally known crystals with up to 20 atoms per unit cell, and the more challenging MPTS52 dataset [4], containing 40,476 experimentally known crystals with up to 52 atoms per unit cell. Evaluations were conducted on 10,000 generated crystals.
Researcher Affiliation Academia 1Department of Materials Science & Engineering, University of Illinois at Urbana-Champaign 2Department of Computer Science, Princeton University 3Materials Research Laboratory, University of Illinois at Urbana-Champaign
Pseudocode Yes A.4.1 Wyckoff-Element Transformer Our encoder-decoder Transformer architecture can be summarized as follows: z0 MLP(e SG||e W ||e A) zl+1 Encoder Layer(zl i, mcausal) z W , z A Split(zlmax) z A MLP(z A, e W ) p W,stop Attention(K = [eall W ||estop], V = [eall W ||estop], Q = z W , mask = m W ) p A Attention(K = eall A , V = eall A , Q = z A, mask = m A)
Open Source Code Yes Our code is available at https://github.com/rees-c/sgequidiff.
Open Datasets Yes We evaluated SGEqui Diff on two benchmark datasets: MP20 [100], containing 45,231 experimentally known crystals with up to 20 atoms per unit cell, and the more challenging MPTS52 dataset [4], containing 40,476 experimentally known crystals with up to 52 atoms per unit cell.
Dataset Splits Yes Data splits were the same as provided by Xie et al. [100] and Baird et al. [4].
Hardware Specification Yes Average sampling times per batch of 500 crystals were measured on an NVIDIA A40 GPU. The model was trained with the Adam optimizer [50] on a single NVIDIA A40 GPU.
Software Dependencies No Our code was written with Py Torch [72] and Py Torch Geometric [21]. Using sympy [61] and Py Xtal [22], we removed redundancy induced by space group symmetry from every Wyckoff position by intersecting each one with its exact asymmetric unit. The paper mentions software names but does not provide specific version numbers for them.
Experiment Setup Yes We list hyperparameters and training times for SGEqui Diff in Sec. A.7. (referencing Table A.7 Training and hyperparameters which explicitly lists hyperparameter values like Batch size 256, Number of epochs 1000, Lattice sampler hidden dimension 256, etc.)