reproducibilityindex.ai

Invariant Tokenization of Crystalline Materials for Language Model Enabled Generation

Authors: Keqiang Yan, Xiner Li, Hongyi Ling, Kenna Ashen, Carl Edwards, Raymundo Arroyave, Marinka Zitnik, Heng Ji, Xiaofeng Qian, Xiaoning Qian, Shuiwang Ji

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experimental results, We conduct experiments on four datasets following Diff CSP [17] and Crysta LLM [12], including Perov-5, Carbon-24, MP-20, and MPTS-52.
Researcher Affiliation	Academia	1Texas A&M University, College Station, TX 77843, USA 2University of Illinois Urbana-Champaign, Champaign, IL 61820, USA 3Harvard University, Boston, MA 02115, USA
Pseudocode	No	The paper describes its method via text and a pipeline diagram (Figure 2) but does not include explicit pseudocode or algorithm blocks.
Open Source Code	No	The code will be release after the paper is publicly available.
Open Datasets	Yes	We conduct experiments on four datasets following Diff CSP [17] and Crysta LLM [12], including Perov-5, Carbon-24, MP-20, and MPTS-52. ... We have used datasets including Perov-5, Carbon-24, and MP20 curated by CDVAE [15] with MIT License, MPTS-52 curated by Diff CSP [17] with MIT License, JARVIS-DFT [42] with NIST License, Crysta LLM [12] with MIT License, the Materials Project [26] with Creative Commons Attribution 4.0 License, OQMD [30] with Creative Commons Attribution 4.0 International License, and NOMAD [31] with Apache License Version 2.0, January 2004.
Dataset Splits	Yes	We directly follow Diff CSP [17] to split corresponding datasets into training, evaluation, and test sets.
Hardware Specification	Yes	A single NVIDIA A100 GPU is used for computing for this task.
Software Dependencies	No	The paper mentions using GPT-2 as the language model but does not specify software dependencies like programming languages, libraries, or frameworks with version numbers (e.g., Python, PyTorch, or TensorFlow versions).
Experiment Setup	Yes	We show the detailed training parameters including window size, batch size, learning rate, drop out ratio, number of training iterations for different tasks in Table. 7. ... During sampling phase, for Perov-5 dataset, we use temperature=0.7 and top-k=10 one shot generation...