Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization
Authors: Kaidi Cao, Yining Chen, Junwei Lu, Nikos Arechiga, Adrien Gaidon, Tengyu Ma
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our method on several benchmark tasks, including a real-world heteroskedastic and imbalanced dataset, Web Vision. Our experiments corroborate our theory and demonstrate a significant improvement over other methods in noise-robust deep learning.1 3 EXPERIMENTS We experimentally show that our proposed algorithm HAR(Algorithm 1) improves the test performance of the noisier and rarer groups of examples (by stronger regularization) without negatively affecting the training and test performance of the other groups. We evaluate our algorithms on three vision datasets and one NLP dataset: CIFAR-10 and CIFAR-100 (Krizhevsky et al., 2009), IMDB-review (Maas et al., 2011) (see Appendix C.1), and Web Vision (Li et al., 2017), a real-world heteroskedastic and imbalanced dataset. |
| Researcher Affiliation | Collaboration | Kaidi Cao1, Yining Chen1, Junwei Lu2, Nikos Arechiga3, Adrien Gaidon3, Tengyu Ma1 1Stanford University, 2Harvard University, 3Toyota Research Institute |
| Pseudocode | Yes | Algorithm 1 Heteroskedastic Adaptive Regularization (HAR) |
| Open Source Code | Yes | Code available at https://github.com/kaidic/HAR. |
| Open Datasets | Yes | We evaluate our algorithms on three vision datasets and one NLP dataset: CIFAR-10 and CIFAR-100 (Krizhevsky et al., 2009), IMDB-review (Maas et al., 2011) (see Appendix C.1), and Web Vision (Li et al., 2017), a real-world heteroskedastic and imbalanced dataset. |
| Dataset Splits | Yes | 1: Split training set D into Dtrain and Dval and by default we split D equally and randomly into Dtrain and Dval. |
| Hardware Specification | Yes | We train each model with 1 NVIDIA Ge Force RTX 2080 Ti. and We train each model with 8 NVIDIA Tesla V100 GPUs. |
| Software Dependencies | No | We develop our core algorithm in Py Torch (Paszke et al., 2017). and The network is trained for 20 epochs with Adam optimizer (Kingma & Ba, 2014). (Specific software versions are not provided). |
| Experiment Setup | Yes | We use standard SGD with momentum of 0.9, weight decay of 1 10 4 for training. The model is trained with a batch size of 128 for 120 epochs. We anneal the learning rate by a factor of 10 at 80 and 100 epochs. and The initial learning rate is 0.1 and is annealed by a factor of 10 at epoch 60 and 90. |