Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

LLM Meets Diffusion: A Hybrid Framework for Crystal Material Generation

Authors: Subhojyoti Khastagir, KISHALAY DAS, Pawan Goyal, Seung-Cheol Lee, Satadeep Bhattacharjee, niloy ganguly

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on benchmark datasets demonstrate that Crys LLMGen consistently outperforms state-of-the-art models in both structural and compositional validity. It also generates more stable, unique, and novel crystal structures compared to existing approaches. Crys LLMGen shows strong generative capability under conditional prompts, effectively producing materials aligning with specified atomic compositions and space group constraints.
Researcher Affiliation	Academia	1 Indian Institute of Technology, Kharagpur, India 2 Indo Korea Science and Technology Center, Bangalore, India Correspondence to Kishalay: EMAIL
Pseudocode	Yes	Algorithm 1 Sampling Process of Crys LLMGen
Open Source Code	Yes	Code is available at https://github.com/kdmsit/crysllmgen
Open Datasets	Yes	We use two popular material datasets for this task: Perov-5 [38, 39] and MP-20 [40]. While training all competitive models, we followed the standard dataset split of 60% for training, 20% for validation, and 20% for testing.
Dataset Splits	Yes	While training all competitive models, we followed the standard dataset split of 60% for training, 20% for validation, and 20% for testing.
Hardware Specification	No	Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes] Justification: See section 5 and appendix C. Guidelines: The answer NA means that the paper does not include experiments. The paper should indicate the type of compute workers CPU or GPU, internal cluster, or cloud provider, including relevant memory and storage. The paper should provide the amount of compute required for each of the individual experimental runs as well as estimate the total compute.
Software Dependencies	No	We finetune the LLa MA-2 7B model for 1 epoch using the Adam W optimizer implemented via the transformers.Trainer interface. The learning rate is set to 0.0001. For the diffusion component, we use a batch size of 256 and adopt a cosine noise schedule. The model is trained for 1000 diffusion steps and inference is performed using 900 steps. The denoising network is implemented using a 6-layer CSPNet. Optimization is done using the Adam optimizer with a learning rate of 0.001.
Experiment Setup	Yes	LLM Component : We finetune the LLa MA-2 7B model for 1 epoch using the Adam W optimizer implemented via the transformers.Trainer interface. The learning rate is set to 0.0001. Diffusion Model : For the diffusion component, we use a batch size of 256 and adopt a cosine noise schedule. The model is trained for 1000 diffusion steps and inference is performed using 900 steps. The denoising network is implemented using a 6-layer CSPNet. Optimization is done using the Adam optimizer with a learning rate of 0.001.