Self-Adaptive Training: beyond Empirical Risk Minimization
Authors: Lang Huang, Chao Zhang, Hongyang Zhang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the CIFAR and Image Net datasets verify the effectiveness of our approach in two applications: classification with label noise and selective classification. |
| Researcher Affiliation | Academia | Lang Huang Peking University laynehuang@pku.edu.cn Chao Zhang Peking University c.zhang@pku.edu.cn Hongyang Zhang TTIC hongyanz@ttic.edu |
| Pseudocode | Yes | Algorithm 1 Self-Adaptive Training |
| Open Source Code | Yes | The code is available at https://github.com/Layne H/self-adaptive-training. |
| Open Datasets | Yes | We conduct the experiments on the CIFAR10 and CIFAR100 datasets [18]... on the Image Net under both standard setup (i.e., using original labels) and the case that 40% training labels are corrupted. |
| Dataset Splits | Yes | In this section, we conduct the experiments on the CIFAR10 dataset [18], of which we split the original training data into a training set (consists of first 45,000 data pairs) and a validation set (consists of last 5,000 data pairs). |
| Hardware Specification | No | The paper does not provide specific hardware details (like CPU/GPU models or cloud instance types) used for running its experiments. |
| Software Dependencies | No | The paper mentions "implemented on Py Torch [28]" but does not specify a version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | The networks are implemented on Py Torch [28] and optimized using SGD with initial learning rate of 0.1, momentum of 0.9, weight decay of 0.0005, batch size of 256, total training epochs of 200. The learning rate is decayed to zero using cosine annealing schedule [21]. We use data augmentation of random horizontal flipping and cropping. We fix the hyper-parameters Es = 60, α = 0.9 by default if not specified. ... We set the initial learning rate as 0.1 and decay it by a factor of 0.1 in epochs 75 and 90, respectively. We choose 1/λ = 6.0 as suggested by [48] and use Es = 70, α = 0.9 for our approach. |