AutoBalance: Optimized Loss Functions for Imbalanced Data
Authors: Mingchen Li, Xuechen Zhang, Christos Thrampoulidis, Jiasi Chen, Samet Oymak
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive empirical evaluations demonstrate the benefits of Auto Balance over state-of-the-art approaches. Our experimental findings are complemented with theoretical insights on loss function design and the benefits of train-validation split. |
| Researcher Affiliation | Academia | Mingchen Li Xuechen Zhang University of California, Riverside {mli176,xzhan394}@ucr.edu Christos Thrampoulidis University of British Columbia cthrampo@ece.ubc.edu.ca Jiasi Chen University of California, Riverside jiasi@cs.ucr.edu Samet Oymak University of California, Riverside oymak@ece.ucr.edu |
| Pseudocode | Yes | Algorithm 1: Auto Balance via Bilevel Optimization |
| Open Source Code | Yes | All code is available open-source. The code is available online [42]. |
| Open Datasets | Yes | We follow previous works [53, 13, 8] to construct long-tailed versions of the datasets. Specifically, for a K-class dataset, we create a long-tailed dataset by reducing the number of examples per class according to the exponential function n i = niµi, where ni is the original number of examples for class i, n i is the new number of examples per class, and µ < 1 is a scaling factor. Then, we define the imbalance factor ρ = n 0/n K, which is the ratio of the number of examples in the largest class (n 0) to the smallest class (n K). For the CIFAR10-LT and CIFAR100-LT dataset, we construct long-tailed versions of the datasets with imbalance factor ρ = 100. Image Net-LT contains 115,846 training examples and 1,000 classes, with imbalance factor ρ = 256. i Naturalist-2018 contains 435,713 images from 8,142 classes, and the imbalance factor is ρ = 500. These choices follow that of [53]. For all datasets, we split the long-tailed training set into 80% training and 20% validation during the search phase (Figure 1b). |
| Dataset Splits | Yes | We split the dataset S into training ST and validation SV sets with n T and n V examples respectively. For all datasets, we split the long-tailed training set into 80% training and 20% validation during the search phase (Figure 1b). When using Algo. 1, we split the original training data into 50% training and 50% validation. |
| Hardware Specification | No | The paper mentions using ResNet-32 and ResNet-50 models and SGD optimization, but it does not specify any particular hardware components like GPU models, CPU types, or memory amounts used for the experiments. |
| Software Dependencies | No | The paper mentions using standard mini-batch stochastic gradient descent (SGD) and Auto Augment policies, but it does not provide specific version numbers for any software libraries, frameworks (e.g., TensorFlow, PyTorch), or programming languages used. |
| Experiment Setup | Yes | In both CIFAR datasets, the lower-level optimization trains a Res Net-32 model with standard mini-batch stochastic gradient decent (SGD) using learning rate 0.1, momentum 0.9, and weight decay 1e 4, over 300 epochs. The learning rate decays at epochs 220 and 260 with a factor 0.1. [...] using SGD with initial learning rate 0.05, momentum 0.9, and weight decay 1e 4, we follow the same learning rate decay at epoch 220 and 260. [...] For Image Net-LT and i Naturalist, following previous work [53], we use Res Net-50 and SGD for the lower and upper optimizations, For the learning rate scheduling, we use cosine scheduling starting with learning rate 0.05, and batch size 128. In searching phase, we conduct 150 epoch training with 40 epoch warm-up before the loss function design starts. For the retraining phase, we train for 90 epochs, which is the same as [53] but only due to the lack of training resources we change the batch size to 128 and adjust initial learning rate accordingly as suggested by [22]. |