Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

LLM Meets Diffusion: A Hybrid Framework for Crystal Material Generation

Authors: Subhojyoti Khastagir, KISHALAY DAS, Pawan Goyal, Seung-Cheol Lee, Satadeep Bhattacharjee, niloy ganguly

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on benchmark datasets demonstrate that Crys LLMGen consistently outperforms state-of-the-art models in both structural and compositional validity. It also generates more stable, unique, and novel crystal structures compared to existing approaches. Crys LLMGen shows strong generative capability under conditional prompts, effectively producing materials aligning with specified atomic compositions and space group constraints.
Researcher Affiliation Academia 1 Indian Institute of Technology, Kharagpur, India 2 Indo Korea Science and Technology Center, Bangalore, India Correspondence to Kishalay: EMAIL
Pseudocode Yes Algorithm 1 Sampling Process of Crys LLMGen
Open Source Code Yes Code is available at https://github.com/kdmsit/crysllmgen
Open Datasets Yes We use two popular material datasets for this task: Perov-5 [38, 39] and MP-20 [40]. While training all competitive models, we followed the standard dataset split of 60% for training, 20% for validation, and 20% for testing.
Dataset Splits Yes While training all competitive models, we followed the standard dataset split of 60% for training, 20% for validation, and 20% for testing.
Hardware Specification No Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes] Justification: See section 5 and appendix C. Guidelines: The answer NA means that the paper does not include experiments. The paper should indicate the type of compute workers CPU or GPU, internal cluster, or cloud provider, including relevant memory and storage. The paper should provide the amount of compute required for each of the individual experimental runs as well as estimate the total compute.
Software Dependencies No We finetune the LLa MA-2 7B model for 1 epoch using the Adam W optimizer implemented via the transformers.Trainer interface. The learning rate is set to 0.0001. For the diffusion component, we use a batch size of 256 and adopt a cosine noise schedule. The model is trained for 1000 diffusion steps and inference is performed using 900 steps. The denoising network is implemented using a 6-layer CSPNet. Optimization is done using the Adam optimizer with a learning rate of 0.001.
Experiment Setup Yes LLM Component : We finetune the LLa MA-2 7B model for 1 epoch using the Adam W optimizer implemented via the transformers.Trainer interface. The learning rate is set to 0.0001. Diffusion Model : For the diffusion component, we use a batch size of 256 and adopt a cosine noise schedule. The model is trained for 1000 diffusion steps and inference is performed using 900 steps. The denoising network is implemented using a 6-layer CSPNet. Optimization is done using the Adam optimizer with a learning rate of 0.001.