reproducibilityindex.ai

Influence Functions in Deep Learning Are Fragile

Authors: Samyadeep Basu, Phil Pope, Soheil Feizi

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we provide a comprehensive and large-scale empirical study of successes and failures of inﬂuence functions in neural network models trained on datasets such as Iris, MNIST, CIFAR-10 and Image Net. Through our extensive experiments, we show that the network architecture, its depth and width, as well as the extent of model parameterization and regularization techniques have strong effects in the accuracy of inﬂuence functions.
Researcher Affiliation	Academia	Samyadeep Basu , Phillip Pope & Soheil Feizi Department of Computer Science University of Maryland, College Park {sbasu12,pepope,sfeizi}@cs.umd.edu
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about making its source code available or provide a link to a code repository.
Open Datasets	Yes	Datasets: We ﬁrst study the behaviour of inﬂuence functions in a small Iris dataset (Anderson, 1936)... we use small MNIST (Koh & Liang, 2017)... trained on the standard MNIST (Le Cun et al., 1998) and CIFAR-10 (Krizhevsky et al., 2000) datasets. Finally, ...we use Image Net (Deng et al., 2009).
Dataset Splits	No	The paper mentions training and testing on datasets like MNIST, CIFAR-10, and ImageNet, and refers to concepts like "top-5 validation accuracy" in the context of ImageNet. However, it does not explicitly specify the training, validation, and test split percentages or exact counts for any of the datasets used, nor does it cite a source for predefined splits.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or cloud instance specifications).
Software Dependencies	No	The paper mentions using a "Py Torch pretrained model" in the appendix, but it does not specify any software names with their version numbers (e.g., PyTorch version, Python version, or other libraries).
Experiment Setup	Yes	We train models to convergence for 60k iterations with full-batch gradient descent. To obtain the ground-truth estimates, we retrain the models for 7.5k steps, starting from the optimal model parameters... For the network trained with weight-decay, we observe a Spearman correlation of 0.97... a damping factor of 0.001 is added to the Hessian matrix... The model has 2600 parameters and is trained for 500k iterations... a regularization factor of 0.001 is used.