reproducibilityindex.ai

If Influence Functions are the Answer, Then What is the Question?

Authors: Juhan Bae, Nathan Ng, Alston Lo, Marzyeh Ghassemi, Roger B. Grosse

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments investigate the following questions: (1) What factors discussed in Section 4 contribute most to the misalignment between influence functions and LOO retraining? (2) While influence functions fail to approximate the effect of retraining, do they accurately approximate the PBRF? (3) How do changes in weight decay, damping, the number of total epochs, and the number of removed training examples affect each source of misalignment?
Researcher Affiliation	Academia	Juhan Bae1,2, Nathan Ng1,2,3, Alston Lo1,2, Marzyeh Ghassemi3, Roger Grosse1,2 1University of Toronto, 2Vector Institute, 3Massachusetts Institute of Technology {jbae, nng, rgrosse}@cs.toronto.edu alston.lo@mail.utoronto.ca mghassem@mit.edu
Pseudocode	No	The paper provides theoretical derivations and experimental details but does not include any pseudocode or algorithm blocks.
Open Source Code	No	(a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] We have not attached the code.
Open Datasets	Yes	We analyzed the logistic regression (LR) model trained on the Cancer and Diabetes classification datasets from the UCI collection [Dua and Graff, 2017]. ... image classification on 10% of the MNIST [Deng, 2012] and Fashion MNIST [Xiao et al., 2017] datasets... Autoencoder. Next, we applied our framework to an 8-layer autoencoder (AE) on the full MNIST dataset. ... Le Net [Lecun et al., 1998], Alex Net [Krizhevsky et al., 2012], VGG13 Simonyan and Zisserman [2014], and Res Net-20 [He et al., 2015] were trained on 10% of the MNIST dataset and the full CIFAR10 [Krizhevsky, 2009] dataset. ... Transformer language models on the Penn Treebank (PTB) [Marcus et al., 1993] dataset.
Dataset Splits	No	For image classification (MNIST, Fashion MNIST, CIFAR-10) and regression (Concrete, Energy), we use the full training dataset to train the model and report performance on the test dataset.
Hardware Specification	Yes	All experiments are run on NVIDIA GeForce RTX 3090 GPUs.
Software Dependencies	No	We implement our models using PyTorch [Paszke et al., 2019] and JAX [Bradbury et al., 2018].
Experiment Setup	Yes	We trained the networks for 1000 epochs using stochastic gradient descent (SGD) with a batch size of 128 and set a damping strength of λ = 0.001. ... We trained the network for 1000 epochs using SGD with momentum. We set the batch size to 1024, used L2 regularization of 10 5 with a damping factor of λ = 0.001. ... We trained the base network for 200 epochs on both datasets with a batch size of 128. For MNIST, we kept the learning rate fixed throughout training, while for CIFAR10, we decayed the learning rate by a factor of 5 at epochs 60, 120, and 160, following Zagoruyko and Komodakis [2016]. We used L2 regularization with strength 5 10 4 and a damping factor of λ = 0.001.