reproducibilityindex.ai

A Bayesian Approach To Analysing Training Data Attribution In Deep Learning

Authors: Elisa Nguyen, Minjoon Seo, Seong Joon Oh

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate the rarity of such noise-independent training-test data pairs but confirm their existence. We introduce our experimental settings, present analyses on factors contributing to the reliability of TDA values, compare TDA methods, and draw suggestions on the evaluation practice of TDA.
Researcher Affiliation	Academia	Elisa Nguyen Tübingen AI Center University of Tübingen Minjoon Seo KAIST AI Seong Joon Oh Tübingen AI Center University of Tübingen
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is provided at https://github.com/Elisa Nguyen/bayesian-tda.
Open Datasets	Yes	We use variants of MNIST [24] limited to three classes (MNIST3), and CIFAR10 [25].
Dataset Splits	No	For MNIST3, we sample a training set of size 150 and a test set of size 900, i.e. 135,000 train-test pairs. For CIFAR10, we define the training and test set at size 500, i.e. 250,000 train-test pairs.
Hardware Specification	Yes	All experiments were run on a single Nvidia 2080ti GPU.
Software Dependencies	No	We use the Py Torch implementation of IF from Guo et al. [16] and modify it for our models. For training the Vi T with Lo RA, we use the peft [32] and Hugging Face transformers library [33].
Experiment Setup	Yes	We use the Adam optimizer with a learning rate of 0.001 and a weight decay of 0.005. We use the cross-entropy loss and train the model for 15 epochs on MNIST3 and for 30 epochs on CIFAR10 with a batch size of 32.