Data Cleansing for Models Trained with SGD
Authors: Satoshi Hara, Atsushi Nitanda, Takanori Maehara
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments, we demonstrate that the proposed method can accurately infer the influential instances. Moreover, we used MNIST and CIFAR10 to show that the models can be effectively improved by removing the influential instances suggested by the proposed method. |
| Researcher Affiliation | Academia | satohara@ar.sanken.osaka-u.ac.jp, Osaka University, Japan nitanda@mist.i.u-tokyo.ac.jp, The University of Tokyo, Japan takanori.maehara@riken.jp, RIKEN AIP, Japan |
| Pseudocode | Yes | Algorithm 1 LIE for SGD: Training Phase" and "Algorithm 2 LIE for SGD: Inference Phase" are presented on page 4. |
| Open Source Code | Yes | The codes are available at https://github.com/sato9hara/sgd-influence |
| Open Datasets | Yes | We used three datasets: Adult [Dua and Karra Taniskidou, 2017], 20Newsgroups, and MNIST [Le Cun et al., 1998]." and "We used MNIST and CIFAR10 [Krizhevsky and Hinton, 2009]. |
| Dataset Splits | Yes | In the experiments, we randomly subsampled 200 instances for the training set D and validation set D0." and "From the original training set, we held out randomly selected 10,000 instances for the validation set and used the remaining instances as the training set. |
| Hardware Specification | Yes | The experiments were conducted on 64bit Ubuntu 16.04 with six Intel Xeon E5-1650 3.6GHz CPU, 128GB RAM, and four Ge Force GTX 1080ti. |
| Software Dependencies | Yes | We used Python 3 and PyTorch 1.0 for the experiments. |
| Experiment Setup | Yes | In SGD, we set the epoch K = 20, batch size |St| = 64, and learning rate t = 0.05. |