Generative Hierarchical Materials Search

Authors: Sherry Yang, Simon Batzner, Ruiqi Gao, Muratahan Aykol, Alexander Gaunt, Brendan C McMorrow, Danilo Jimenez Rezende, Dale Schuurmans, Igor Mordatch, Ekin Dogus Cubuk

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that Gen MS outperforms other alternatives of directly using language models to generate structures both in satisfying user request and in generating low-energy structures. We confirm that Gen MS is able to generate common crystal structures such as double perovskites, or spinels, solely from natural language input, and hence can form the foundation for more complex structure generation in near future.
Researcher Affiliation Industry Sherry Yang , Simon Batzner, Ruiqi Gao, Muratahan Aykol, Alexander Gaunt, Brendan Mc Morrow, Danilo Rezende, Dale Schuurmans, Igor Mordatch, Ekin Dogus Cubuk Google DeepMind
Pseudocode Yes Algorithm 1 Generative Hierarchical Materials Search
Open Source Code No The NeurIPS checklist (Question 5) states: 'Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: The data and prompts are all public and we have provided the information. We are working on open sourcing the code after going through the internal approval process.'
Open Datasets Yes Meanwhile, many crystal databases already feature paired data, Dlo = {zi, xi}n i=1, linking chemical formulae to detailed crystal structures. Given this observation, we propose to factorize the crystal generator as π = πhi πlo, where πhi : G 7 (Z) and πlo : Z 7 (X), so that πhi and πlo can be trained using different datasets Dhi and Dlo. Furthermore, many crystal databases such as the Materials Project [10], ICSD [11], OQMD [12], and NOMAD [25] are mentioned and cited.
Dataset Splits No The paper describes training and testing procedures, but does not explicitly specify a validation dataset split (e.g., percentages or counts for a validation set used during training) distinct from the test set.
Hardware Specification Yes Training hardware 64 TPU-v4 chips
Software Dependencies No The paper mentions software like VASP, pymatgen, and atomate, and specifies the Adam optimizer parameters, but does not provide specific version numbers for these software dependencies or libraries.
Experiment Setup Yes Table 8: Hyperparameters for training the diffusion model in Gen MS lists specific values such as Learning rate 5e-5, Batch size 512, Training steps 200000, and Optimizer Adam (β1 = 0.9, β2 = 0.99).