reproducibilityindex.ai

Estimating Training Data Influence by Tracing Gradient Descent

Authors: Garima Pruthi, Frederick Liu, Satyen Kale, Mukund Sundararajan

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we compare Trac In with inﬂuence functions [2] and the representer point selection method [3]. We use an evaluation technique that has been used by the previous papers on the topic (see Section 4.1 [3] and Section 5.4 of [2]).
Researcher Affiliation	Industry	Google pruthi@google.com Frederick Liu Google frederickliu@google.com Satyen Kale Google satyenkale@google.com Mukund Sundararajan Google mukunds@google.com
Pseudocode	No	The paper describes the Trac In method and its approximations mathematically and in prose, but it does not include any formal pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at [1]. [1] Trac In Code. https://github.com/frederick0329/Trac In.
Open Datasets	Yes	We work with Res Net-56 [16] trained on the CIFAR-10 [17]. In this section, we work on the MNIST digit classiﬁcation task. We study Trac In on a regression problem using California housing prices dataset [18]. We apply Trac In on the DBPedia ontology dataset introduced in [19]. We apply Trac In on the fully connected layer of Res Net-50 trained on Imagenet [22].
Dataset Splits	No	The paper describes the use of training and test sets for evaluation, and in some cases, train-test splits, but it does not explicitly mention the use of a distinct validation set or its split.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models, or specific cloud instances.
Software Dependencies	No	The paper mentions using the 'sentencepeice library as tokenizer [21]' but does not specify its version number or any other software dependencies with version information.
Experiment Setup	Yes	All model for CIFAR-10 are trained with 270 epochs with a batch size of 1000 and 0.1 as initial learning rate and the following schedule (1.0, 15), (0.1, 90), (0.01, 180), (0.001, 240) where we apply learning rate warm up in the ﬁrst 15 epochs. We used a 8:2 train-test split and trained a regression model with 3 hidden layers with 168K parameters, using Adam optimizer minimizing MSE for 200 epochs.