Estimating Training Data Influence by Tracing Gradient Descent

Authors: Garima Pruthi, Frederick Liu, Satyen Kale, Mukund Sundararajan

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we compare Trac In with influence functions [2] and the representer point selection method [3]. We use an evaluation technique that has been used by the previous papers on the topic (see Section 4.1 [3] and Section 5.4 of [2]).
Researcher Affiliation Industry Google pruthi@google.com Frederick Liu Google frederickliu@google.com Satyen Kale Google satyenkale@google.com Mukund Sundararajan Google mukunds@google.com
Pseudocode No The paper describes the Trac In method and its approximations mathematically and in prose, but it does not include any formal pseudocode or algorithm blocks.
Open Source Code Yes Code is available at [1]. [1] Trac In Code. https://github.com/frederick0329/Trac In.
Open Datasets Yes We work with Res Net-56 [16] trained on the CIFAR-10 [17]. In this section, we work on the MNIST digit classification task. We study Trac In on a regression problem using California housing prices dataset [18]. We apply Trac In on the DBPedia ontology dataset introduced in [19]. We apply Trac In on the fully connected layer of Res Net-50 trained on Imagenet [22].
Dataset Splits No The paper describes the use of training and test sets for evaluation, and in some cases, train-test splits, but it does not explicitly mention the use of a distinct validation set or its split.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models, or specific cloud instances.
Software Dependencies No The paper mentions using the 'sentencepeice library as tokenizer [21]' but does not specify its version number or any other software dependencies with version information.
Experiment Setup Yes All model for CIFAR-10 are trained with 270 epochs with a batch size of 1000 and 0.1 as initial learning rate and the following schedule (1.0, 15), (0.1, 90), (0.01, 180), (0.001, 240) where we apply learning rate warm up in the first 15 epochs. We used a 8:2 train-test split and trained a regression model with 3 hidden layers with 168K parameters, using Adam optimizer minimizing MSE for 200 epochs.