Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MS-BART: Unified Modeling of Mass Spectra and Molecules for Structure Elucidation

Authors: Yang Han, Pengyu Wang, Kai Yu, xin chen, Lu Chen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluations demonstrate that MS-BART achieves SOTA performance across 5/12 key metrics on Mass Spec Gym and NPLIB1 and is faster by one order of magnitude than competing diffusion-based methods, while comprehensive ablation studies systematically validate the model s effectiveness and robustness.
Researcher Affiliation	Academia	Yang Han1,2, Pengyu Wang1,2, Kai Yu1,2,4, Xin Chen2*, Lu Chen1,2,3,4 1X-LANCE Lab, School of Computer Science Mo E Key Lab of Artificial Intelligence, SJTU AI Institute Shanghai Jiao Tong University, Shanghai, China 2Suzhou Laboratory, Suzhou, China 3Shanghai Innovation Institute, Shanghai, China 4Jiangsu Key Lab of Language Computing, Suzhou, China EMAIL,EMAIL
Pseudocode	No	The paper describes the methodology in Section 3 and Figure 2, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We provide the data and code at https://github.com/Open DFM/MS-BART.
Open Datasets	Yes	We evaluate our MS-BART model on two widely used open-source benchmarks: NPLIB1 [10] and Mass Spec Gym [7], following prior works [4, 45].
Dataset Splits	Yes	The dataset is partitioned into training, validation, and test sets based on the edit distance between molecular structures, facilitating robust evaluation. ... For the NPLIB1 dataset, we first evaluated the structural similarity between its test set and our pretraining data.
Hardware Specification	Yes	We systematically evaluated MS-BART s performance on the complete Mass Spec Gym test fold using an NVIDIA A800-SXM480GB GPU... The training was conducted on four NVIDIA A800-SXM4-80GB GPUs using bfloat16 precision. ... Fine-tuning is performed on a single NVIDIA A800-SXM4-80GB GPU using bfloat16 precision... when tested on a common consumer-grade GPU like the RTX 4090 with a beam width of 100...
Software Dependencies	No	The paper mentions software components like BART-BASE, RDKit, and MIST, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	During pretraining, we set the maximum sequence length to 512... For finetuning and alignment, we fix the input and output token lengths to 256... We adopt a learning rate of 5e-5 combined with a warm-up phase covering 10% of total training steps. Each training iteration processes 128 samples per batch... Pretraining... A per-device batch size of 96 and two gradient accumulation steps were employed, resulting in an effective total batch size of 768 across three training epochs. The optimization process adopted a cosine learning rate scheduler with a warm-up phase of 10,000 steps. The learning rate increased linearly from zero to a peak value of 6e-4 during warm-up and subsequently decayed following a cosine schedule to a minimum value of 1e-5. Table 6: Hyper-parameter settings lists Learning Rate {1e-5, 5e-5}, Candidate Margin γ {0.05, 0.1, 0.2}, Rank Loss Weight α {1, 3, 5}, Number of Candidates {3, 5}, Length Penalty Coefficient {1.4, 1.6}.