Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Taming Hyperparameter Sensitivity in Data Attribution: Practical Selection Without Costly Retraining

Authors: Weiyi Wang, Junwei Deng, Yuzheng Hu, Shiyuan Zhang, Xirui Jiang, Runting Zhang, Han Zhao, Jiaqi Ma

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we present the first large-scale empirical study to understand the hyperparameter sensitivity of common data attribution methods. ... We conduct the first comprehensive empirical study of hyperparameter sensitivity in data attribution, benchmarking a range of widely used methods across diverse settings, confirming the necessity of practical hyperparameter selection strategies.
Researcher Affiliation Academia 1University of Michigan Ann Arbor 2University of Illinois Urbana-Champaign EMAIL EMAIL
Pseudocode Yes Algorithm 1 Selecting λ with the surrogate indicator. Input: A candidate set C of λ, a subset T Z of test examples. Output: A selected λ̂. 1: for λ C do 2: Compute ξz ,λ for all z T; 3: ξT,λ 1 |T | P z T ξz ,λ; 4: end for 5: λ̂ arg minλ C | ξT,λ 0.5|;
Open Source Code Yes Our code is publicly available at https://github.com/TRAIS-Lab/data-attribution-hp.
Open Datasets Yes We perform empirical evaluations on three standard benchmark settings: MNIST [25] with a multilayer perceptron (MLP), CIFAR-2 [24] with Res Net-9 [14], and Wiki Text2 [28] with the GPT2 [31], which consists of both image and text data with various model sizes. ... For the dataset we use: MNIST-10 dataset holds CC BY-SA 3.0 license; CIFAR-10 dataset holds CC-BY 4.0 license; Wiki Text2 dataset holds CC BY-SA 3.0 license. ... Music Transformer (MT) [2] on MAESTRO dataset [13].
Dataset Splits No The paper does not explicitly provide the training/validation/test splits for the primary datasets (MNIST, CIFAR-2, WikiText2, MAESTRO) used to train the base models. It describes how subsets are sampled for the attribution evaluation (e.g., 'sample s subsets A = {A1, , As}, where each Aj S is sampled uniformly at random with fixed size a'), but not the initial splits for the main model training.
Hardware Specification Yes The experiments for the hyperparameter sensitivity analysis are done on 4 A100 GPUs in around 100 hours... The experiments for the surrogate indicator are done on an A40 GPU in around 10 hours...
Software Dependencies No The paper mentions using 'dattri library [8]' and models like 'GPT2 [31]' and 'Res Net-9 [14]', but does not specify version numbers for any software, libraries, or programming languages used.
Experiment Setup Yes Hyperparameter selection and search space. We consider both the common hyperparameters introduced in Section 2.1 and some critical method-specific hyperparameters. For TRAK, we experiment with regularization, projection-dimension, and training-epoch. For Trac In, we search for projection-dimension, normalization, and checkpoint-selection. For IF, we analyze regularization and training-epoch, as well as max-iteration for the CG variant, and scaling and recursion-depth for the Li SSA variant. For Lo Gra, we search for regularization, projection-dimension, and trainingepoch. We design the search space for each hyperparameter around its default value proposed by the original papers. The detailed definitions and the search space of the hyperparameters are stated in Appendix A.1. ... Table 1: Default values of hyperparameters.