Scaling Up Influence Functions

Authors: Andrea Schioppa, Polina Zablotskaia, David Vilar, Artem Sokolov8179-8186

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on image classification and sequence-to-sequence tasks with tens to a hundred of millions of training examples.
Researcher Affiliation Industry Andrea Schioppa , Polina Zablotskaia, David Vilar, Artem Sokolov Google Research {arischioppa, polinaz, vilar, artemsok}@google.com
Pseudocode Yes Algorithm 1: Arnoldi
Open Source Code Yes Our code will be available at https://github.com/googleresearch/jax-influence.
Open Datasets Yes We evaluate our approach on image classification and sequence-to-sequence tasks with tens to a hundred of millions of training examples." "on 14M (Image Net) and 100M (Paracrawl) training examples." and "small MNIST dataset (Le Cun, Cortes, and Burges 1994)"
Dataset Splits Yes to be able to compare all baselines we pick the small MNIST dataset (Le Cun, Cortes, and Burges 1994) and consider two CNNs of different sizes: a small one that permits the exact Hessian calculation, and a larger one on which we can gauge the scalability potential. Because the influence calculation with LISSA and Trac In is slow, following (Koh and Liang 2017), we take two 10% subsamples of the original data for training and evaluation, and randomly relabel 20% of training examples to create a corrupted dataset to evaluate mislabeled example retrieval with influence estimates.
Hardware Specification Yes trained it for 10 epochs on GPU V100
Software Dependencies No The paper mentions software like 'Flax' and 'Ja X' implementations, but does not provide specific version numbers for these or any other software dependencies required for reproducibility.
Experiment Setup Yes First, to ensure convergence, the network is trained for more steps (500k) than one would normally do, with a large batch size of 500 images. Second, the ℓ2-regularization of 5 10 3 is introduced to make H positive definite.