Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

AutoSciDACT: Automated Scientific Discovery through Contrastive Embedding and Hypothesis Testing

Authors: Sam Bright-Thonney, Christina Reissel, Gaia Grosso, Nathaniel Woodward, Katya Govorkova, Andrzej Novak, Sangeon Park, Eric Moreno, Philip Harris

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform experiments across a range of astronomical, physical, biological, image, and synthetic datasets, demonstrating strong sensitivity to small injections of anomalous data across all domains. ... We validate our approach using synthetic benchmarks and real-world datasets from astronomy, physics and biomedical domains.
Researcher Affiliation	Academia	1Department of Physics, Massachusetts Institute of Technology 2 The NSF AI Institute for Artificial Intelligence and Fundamental Interactions 3 Department of Physics, University of Wisconsin, Madison
Pseudocode	No	The paper describes the NPLM algorithm and contrastive learning mathematically and in paragraph text, but does not provide a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	Our repository is publicly hosted on Git Hub, and we will write detailed instructions about running code and reproducing experiments in the README. We will include the Git Hub link in the final version of this paper, after double-blind review.
Open Datasets	Yes	Astronomy For an astronomical baseline we choose gravitational wave data recorded by the Laser Interferometer Gravitational-Wave Observatories (LIGO) in Hanford, WA and Livingston, LA [54]. ... Particle Physics Our particle physics baseline is JETCLASS [59, 60] ... Histology ... publicly available optical microscope images from stained tissue samples [63]. ... Images We use the CIFAR-10 dataset [65] ... We run Auto Sci DACT on both real and simulated events from the CMS Open Data [124] for 2011 and 2012.
Dataset Splits	Yes	The available data is split into a training, validation and test dataset with the test dataset not only utilized for testing the pre-training performance, but also for constructing the reference R and data distribution D for the hypothesis test. ... The dataset is split into training, validation, and testing sets with a ratio of 70%/20%/10%. ... We adopt a fixed split of 80% training, 10% validation and 10% test sets.
Hardware Specification	Yes	All of the experiments presented in this paper were run on an academic computing cluster. The contrastive trainings were run on a single NVIDIA A100 GPU in all cases, and none took more than a few hours to compute.
Software Dependencies	No	The paper mentions software like the 'GPU-accelerated Falkon package [77, 78]' and 'Adam W optimizer [87]' but does not provide specific version numbers for these or other software components like programming languages or libraries.
Experiment Setup	Yes	The loss function used was Sim CLR with τ = 0.01 and λCE = 0.5, a learning rate of 0.001 with a batch size of 1000 is used along with a cosine annealing. Trainings are performed over 50 epochs and take roughly 5 minutes on a CPU. ... The optimization of the backbone encoder fθ a one-dimensional Res Net with about 7.2M trainable weights uses the combined loss objective (Sim CLR temperature τ = 0.5, λCE = 0.5) and the Adam W optimizer [87] with an initial learning rate of 0.001 and 350 batch size. To facilitate improved convergence and generalization, a cosine annealing learning rate schedule is employed. The training is set up for a maximum of 25 epochs... We train the encoder with a Sim CLR temperature τ = 0.1 and a classifier strength of λCE = 0.1, running for 100 epochs with an initial learning rate of 5 10 4 annealed to 10 5 on a cosine schedule and using the Adam W optimizer [88]. We use a batch size of 512.