reproducibilityindex.ai

Understanding Influence Functions and Datamodels via Harmonic Analysis

Authors: Nikunj Saunshi, Arushi Gupta, Mark Braverman, Sanjeev Arora

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments are conducted on the CIFAR-10 data to test the estimation procedure and the quality of the linear fit in Figures 1 and 2. We use the FFCV library Leclerc et al. (2022) to train models on CIFAR-10; each model takes 30s to train on our GPUs.
Researcher Affiliation	Academia	Nikunj Saunshi, Arushi Gupta, Mark Braverman, Sanjeev Arora Department of Computer Science, Princeton University {nsaunshi, arushig, mbraverm, arora}@cs.princeton.edu
Pseudocode	Yes	Algorithm 1 Efficient algorithm for residual estimation
Open Source Code	No	The paper references the FFCV library (Leclerc et al., 2022) which is open-source, but does not state that the authors' own code for this paper's methodology or experiments is being released or is publicly available.
Open Datasets	Yes	Experiments are conducted on the CIFAR-10 data to test the estimation procedure and the quality of the linear fit in Figures 1 and 2.
Dataset Splits	No	The paper mentions training on subsets of CIFAR-10 and refers to 'test examples' but does not specify the train/validation/test splits or their sizes for reproducibility, beyond stating models were trained on sets of size 5000 from a 10k image subset.
Hardware Specification	No	The paper vaguely mentions "train on our GPUs" but does not provide specific GPU models, CPU models, or other detailed hardware specifications.
Software Dependencies	No	The paper mentions using "the FFCV library Leclerc et al. (2022)" but does not specify a version number for it or any other software dependencies.
Experiment Setup	Yes	We use the default Res Net based architecture in FFCV with a batch size of 512, an initial learning rate of 0.5, 24 epochs, weight decay of 5e-4 and SGD with momentum as the optimizer.