Self-Adaptive Training: beyond Empirical Risk Minimization

Authors: Lang Huang, Chao Zhang, Hongyang Zhang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the CIFAR and Image Net datasets verify the effectiveness of our approach in two applications: classification with label noise and selective classification.
Researcher Affiliation Academia Lang Huang Peking University laynehuang@pku.edu.cn Chao Zhang Peking University c.zhang@pku.edu.cn Hongyang Zhang TTIC hongyanz@ttic.edu
Pseudocode Yes Algorithm 1 Self-Adaptive Training
Open Source Code Yes The code is available at https://github.com/Layne H/self-adaptive-training.
Open Datasets Yes We conduct the experiments on the CIFAR10 and CIFAR100 datasets [18]... on the Image Net under both standard setup (i.e., using original labels) and the case that 40% training labels are corrupted.
Dataset Splits Yes In this section, we conduct the experiments on the CIFAR10 dataset [18], of which we split the original training data into a training set (consists of first 45,000 data pairs) and a validation set (consists of last 5,000 data pairs).
Hardware Specification No The paper does not provide specific hardware details (like CPU/GPU models or cloud instance types) used for running its experiments.
Software Dependencies No The paper mentions "implemented on Py Torch [28]" but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup Yes The networks are implemented on Py Torch [28] and optimized using SGD with initial learning rate of 0.1, momentum of 0.9, weight decay of 0.0005, batch size of 256, total training epochs of 200. The learning rate is decayed to zero using cosine annealing schedule [21]. We use data augmentation of random horizontal flipping and cropping. We fix the hyper-parameters Es = 60, α = 0.9 by default if not specified. ... We set the initial learning rate as 0.1 and decay it by a factor of 0.1 in epochs 75 and 90, respectively. We choose 1/λ = 6.0 as suggested by [48] and use Es = 70, α = 0.9 for our approach.