Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Scaling Up Influence Functions
Authors: Andrea Schioppa, Polina Zablotskaia, David Vilar, Artem Sokolov8179-8186
AAAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on image classification and sequence-to-sequence tasks with tens to a hundred of millions of training examples. |
| Researcher Affiliation | Industry | Andrea Schioppa , Polina Zablotskaia, David Vilar, Artem Sokolov Google Research EMAIL |
| Pseudocode | Yes | Algorithm 1: Arnoldi |
| Open Source Code | Yes | Our code will be available at https://github.com/googleresearch/jax-influence. |
| Open Datasets | Yes | We evaluate our approach on image classification and sequence-to-sequence tasks with tens to a hundred of millions of training examples." "on 14M (Image Net) and 100M (Paracrawl) training examples." and "small MNIST dataset (Le Cun, Cortes, and Burges 1994)" |
| Dataset Splits | Yes | to be able to compare all baselines we pick the small MNIST dataset (Le Cun, Cortes, and Burges 1994) and consider two CNNs of different sizes: a small one that permits the exact Hessian calculation, and a larger one on which we can gauge the scalability potential. Because the influence calculation with LISSA and Trac In is slow, following (Koh and Liang 2017), we take two 10% subsamples of the original data for training and evaluation, and randomly relabel 20% of training examples to create a corrupted dataset to evaluate mislabeled example retrieval with influence estimates. |
| Hardware Specification | Yes | trained it for 10 epochs on GPU V100 |
| Software Dependencies | No | The paper mentions software like 'Flax' and 'Ja X' implementations, but does not provide specific version numbers for these or any other software dependencies required for reproducibility. |
| Experiment Setup | Yes | First, to ensure convergence, the network is trained for more steps (500k) than one would normally do, with a large batch size of 500 images. Second, the ℓ2-regularization of 5 10 3 is introduced to make H positive definite. |