Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Reaction Prediction via Interaction Modeling of Symmetric Difference Shingle Sets

Authors: Runhan Shi, Letian Chen, Gufeng Yu, Yang Yang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that Rea DISH improves reaction prediction performance across diverse benchmarks. It shows enhanced robustness with an average improvement of 8.76% on R2 under permutation perturbations. Section 4 Experiments
Researcher Affiliation	Academia	1AGI Institute, School of Computer Science, Shanghai Jiao Tong University 2Shanghai Innovation Institute EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 Pipeline to extract symmetric difference shingles
Open Source Code	Yes	1The code is available at https://github.com/Meteor-han/Rea DISH.
Open Datasets	Yes	We collect 3.7M chemical reactions for pre-training based on the United States Patent and Trademark Office (USPTO) dataset [38] and the Chemical Journals with High Impact Factor (CJHIF) dataset [39]. We use seven datasets across a wide range of chemical tasks, including: (1) yield prediction, the Buchwald-Hartwig (BH) dataset [13], the Suzuki-Miyaura (SM) dataset [14], the real-world electronic laboratory notebook (ELN) dataset [40], and the Ni-catalyzed C-O bond activation (Ni COlit) dataset [41]; (2) enantioselectivitiy prediction, the asymmetric N,S-acetal formation (N,S-acetal) dataset [42]; (3) conversion rate estimation, the C-heteroatom-coupling reactions (C-heteroatom) dataset [43]; and (4) reaction type classification, the USPTO_TPL dataset [8].
Dataset Splits	Yes	To assess the generalizability of our approach, we consider both random and out-of-sample splits. In the out-of-sample split, the test set contains reactions involving molecules that do not appear in the training set. Table 3: The statistics of pre-training datasets (first row) and evaluation datasets (remaining rows). All methods are tested on (1) the same ten random splits and (2) the same out-of-sample split across five random runs to ensure fair comparisons, with the average results reported.
Hardware Specification	Yes	All experiments are executed on 4 NVIDIA RTX3090 GPUs.
Software Dependencies	No	We use Pytorch [62] with the Adam [63] optimizer and the cosine learning rate decay strategy for training. We apply K means clustering by scikit-learn [61] with different values of K. We remove duplicate records and invalid reactions for pre-training by RDKit [59].
Experiment Setup	Yes	Table 5: Parameters during pre-training. Table 6: Search space of parameters during fine-tuning. We use Pytorch [62] with the Adam [63] optimizer and the cosine learning rate decay strategy for training.