Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Distributional Training Data Attribution: What do Influence Functions Sample?

Authors: Bruno Kacper Mlodozeniec, Isaac Reid, Sam Power, David Krueger, Murat A Erdogdu, Richard E Turner, Roger Baker Grosse

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the practical utility of d-TDA in experiments, including improving data pruning for vision transformers and identifying influential examples with diffusion models. Figure 2: d-TDA demo for a neural network trained on UCI Concrete. Figure 3: Validating Theorem 2. Top: Correlation between changes in measurement predicted by unrolled differentiation and changes predicted by IFs, plotted against training time.
Researcher Affiliation	Academia	1University of Cambridge 2Max Planck Institute for Intelligent Systems 3University of Bristol 4Mila Quebec AI Institute 5University of Toronto 6Vector Institute 7Alan Turing Institute
Pseudocode	Yes	Alg. 1. Unrolled differentiation for d-TDA.
Open Source Code	Yes	The code will be open-sourced upon acceptance, together with the instructions on how to reproduce the main experiments. Anonymised source-code is included in the supplementary for the review.
Open Datasets	Yes	UCI Concrete dataset [35], MNIST dataset [36], CIFAR-10 dataset, Art Bench
Dataset Splits	No	The paper mentions using "the full 50000 images from the train set of CIFAR-10" and removing 5000 datapoints for pruning tasks. It also mentions "Test accuracy improvements on CIFAR-10" and "validation loss", indicating the use of test and validation sets. However, it does not explicitly provide the specific ratios or absolute counts for the overall training, validation, and test splits for any of the datasets (e.g., 80/10/10 split).
Hardware Specification	No	The paper does not explicitly state any specific hardware details such as GPU models, CPU models, or memory specifications used for the experiments. It only broadly states, "The appendix briefly describes the computational resources used for the experiments" (from NeurIPS Paper Checklist item 8), but the appendix sections E.1 and E.2 do not provide these specific details.
Software Dependencies	No	The paper mentions software components like "torch.func.hvp" and "torch.linalg.pinv", implying the use of PyTorch. However, it does not provide specific version numbers for PyTorch or any other software libraries, environments, or programming languages used in the experiments.
Experiment Setup	Yes	Concrete \| MLP. In this setting, we train a multi-layer perceptron (MLP) on a (1D target) regression setting on the UCI Concrete dataset [35]. The MLP with an input size of 8, hidden dimensions of [128, 128, 128], and Ge LU activation functions, was trained using Stochastic Gradient Descent (SGD) with a learning rate of 0.03 and momentum of 0.9. We applied a weight decay of 10 5 and gradient clipping at 1.0. The model was trained for 580 iterations using a mean squared error (MSE) loss function and a batch size of 32. The initial 58 iterations (10% of the total) are dedicated to a linear learning rate warmup from 0.