Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

RESuM: A Rare Event Surrogate Model for Physics Detector Design

Authors: Ann-Kathrin Schuetz, Alan Poon, Aobo Li

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We applied RESu M to optimize neutron moderator designs for the LEGEND NLDBD experiment, identifying an optimal design that reduces neutron background by (66.5 3.5)% while using only 3.3% of the computational resources compared to traditional methods. Sections such as '5 EXPERIMENT AND RESULT' and '6 MODEL VALIDATION AND BENCHMARKING' describe the empirical studies, including quantitative results, comparisons to baselines, and evaluation metrics.
Researcher Affiliation	Academia	1Nuclear Science Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA 2Halıcıoğlu Data Science Institute, Department of Physics, UC San Diego, La Jolla, CA 92093, USA. All listed affiliations are with academic institutions or public research laboratories.
Pseudocode	No	The paper describes the RESu M model and its components (Conditional Neural Process, Multi-Fidelity Gaussian Process, Active Learning, Adaptive Importance Sampling) through prose and mathematical equations in sections like '4 RARE EVENT SURROGATE MODEL' and its subsections, and appendices '11 MULTI-FIDELITY GAUSSIAN PROCESS', '12 ACTIVE LEARNING STRATEGY', '13 CONDITIONAL NEURAL PROCESS', and '15 ADAPTIVE IMPORTANCE SAMPLING'. However, it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	1Github Repository: https://github.com/annkasch/resum-legend. Additionally, the '10 REPRODUCIBILITY STATEMENT' section states: 'The code of this work is anonymized and released as the supplementary material of this submission.'
Open Datasets	No	The '10 REPRODUCIBILITY STATEMENT' section mentions: 'The training data of this work is too large as it involves in expensive simulations. The authors plan to release training data in the camera-ready version.' This indicates that the dataset is not publicly available at the time of publication.
Dataset Splits	Yes	Section 5.1 'NEUTRON MODERATOR SIMULATIONS' states: 'In total, 4 HF and 304 LF simulations were generated to form the training dataset for the surrogate model'. Section 6 'MODEL VALIDATION AND BENCHMARKING' mentions: 'we generated 100 out-of-sample HF simulations at randomly sampled θ values [for validation]'. Section 14 'MODEL BENCHMARKING DETAILS' specifies: 'The training dataset is identical to the RESu M training data described in Section 5, comprising 310 LF and 10 HF simulations. We further divided this dataset into three subsets: Small (305 LF + 5 HF), medium (307 LF + 7 HF), and full (310 LF + 10 HF).' It also describes: 'we performed 100 iterations, where each iteration involved randomly splitting the 410 LF and 110 HF samples into a full training dataset and a validation set (100 HF samples).'
Hardware Specification	No	The '9 ACKNOWLEDGEMENT' section states: 'The work at Lawrence Berkeley National Laboratory (LBNL), including computational resources provided by the National Energy Research Scientific Computing Center (NERSC), is supported by the U.S. Department of Energy (DOE) under Federal Prime Agreement DE-AC02-05CH11231.' This mentions the computing facility (NERSC) but does not provide specific hardware details such as exact GPU/CPU models or memory specifications.
Software Dependencies	No	Section 5.1 mentions: 'we utilized a Monte Carlo (MC) simulation package based on the GEANT-4 toolkit (Agostinelli et al., 2003; Allison et al., 2006), integrated with the existing LEGEND software frameworks (Neuberger; Ramachers and Morgan).' Section 5.3 states: 'The MFGP model was carried out by using the Emukit python library (Paleyes et al., 2023).' While these software tools are identified, specific version numbers for GEANT-4, LEGEND software frameworks, or Emukit are not explicitly provided.
Experiment Setup	No	Section 5.2 'CONDITIONAL NEURAL PROCESS RESULT' details some aspects of CNP training, such as: 'Training is performed using supervised learning, where a signal label (1) is assigned to neutrons that successfully produce 77(m)Ge background, and a background label (0) is assigned to neutrons that do not.' It also describes the 'mixup' technique, where 'λ is randomly drawn from a beta distribution B(0.1, 0.1).' Section 5.3 'RESUM RESULT' mentions: 'six active learning iterations were performed'. However, concrete hyperparameters such as learning rates, batch sizes, optimizer choices, or detailed neural network architectures for the CNP are not explicitly provided in the main text.