Resolving Training Biases via Influence-based Data Relabeling

Authors: Shuming Kong, Yanyan Shen, Linpeng Huang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on ten real-world datasets demonstrate RDIA outperforms the state-of-the-art data resampling methods and improves model s robustness against label noise.
Researcher Affiliation Academia Shuming Kong, Yanyan Shen, Linpeng Huang Department of Computer Science and Engineering Shanghai Jiao Tong University {leinuo123,shenyy,lphuang}@sjtu.edu.cn
Pseudocode Yes The algorithm of RDIA could be found in Appendix A
Open Source Code Yes Our code could be found in the https://github.com/Viperccc/RDIA.
Open Datasets Yes All the datasets could be found in https://www.csie.ntu.edu.tw/cjlin/libsvmtools/datasets/.
Dataset Splits Yes When training logistic regression, we randomly pick up 30% samples from the training set as the validation set. For different influence-based approaches, the training/validation/test sets are kept the same for fair comparison. ...When training deep models, due to the high time complexity of estimating influence functions, we randomly exclude 100 samples (1%) from the test sets of MNIST and CIFAR10 as the respective validation sets, and the remaining data is used for testing.
Hardware Specification Yes We implemented all the comparison methods by using their published source codes in Pytorch and ran all the experiments on a server with 2 Intel Xeon 1.7GHz CPUs, 128 GB of RAM and a single NVIDIA 2080 Ti GPU.
Software Dependencies No The paper mentions 'Pytorch' in Section 5.1 but does not provide a specific version number. It also mentions optimizers like 'Adam optimizer' and 'SGD optimizer' and algorithms like 'Newton-CG algorithm' and 'Stochastic estimation', but not specific software libraries with version numbers.
Experiment Setup Yes For logistic regression model, we select the regularization term C = 0.1 for fair comparison. We adopt the Adam optimizer with the learning rate of 0.001 to train the Le Net on MNIST. After calculating the influence functions and relabeling the identified harmful training samples using R, we reduce the learning rate to 10 5 and update the models until convergence. For CIFAR10, we use the SGD optimizer with the learning rate of 0.01 and the momentum of 0.9 to train the CNN. ... The batch size is set to 64 in all the experiments and the hyperparameter α is tuned in [0, 0.001, 0.002, ...,0.01] with the validation set for best performance.