reproducibilityindex.ai

AutoBalance: Optimized Loss Functions for Imbalanced Data

Authors: Mingchen Li, Xuechen Zhang, Christos Thrampoulidis, Jiasi Chen, Samet Oymak

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive empirical evaluations demonstrate the beneﬁts of Auto Balance over state-of-the-art approaches. Our experimental ﬁndings are complemented with theoretical insights on loss function design and the beneﬁts of train-validation split.
Researcher Affiliation	Academia	Mingchen Li Xuechen Zhang University of California, Riverside {mli176,xzhan394}@ucr.edu Christos Thrampoulidis University of British Columbia cthrampo@ece.ubc.edu.ca Jiasi Chen University of California, Riverside jiasi@cs.ucr.edu Samet Oymak University of California, Riverside oymak@ece.ucr.edu
Pseudocode	Yes	Algorithm 1: Auto Balance via Bilevel Optimization
Open Source Code	Yes	All code is available open-source. The code is available online [42].
Open Datasets	Yes	We follow previous works [53, 13, 8] to construct long-tailed versions of the datasets. Speciﬁcally, for a K-class dataset, we create a long-tailed dataset by reducing the number of examples per class according to the exponential function n i = niµi, where ni is the original number of examples for class i, n i is the new number of examples per class, and µ < 1 is a scaling factor. Then, we deﬁne the imbalance factor ρ = n 0/n K, which is the ratio of the number of examples in the largest class (n 0) to the smallest class (n K). For the CIFAR10-LT and CIFAR100-LT dataset, we construct long-tailed versions of the datasets with imbalance factor ρ = 100. Image Net-LT contains 115,846 training examples and 1,000 classes, with imbalance factor ρ = 256. i Naturalist-2018 contains 435,713 images from 8,142 classes, and the imbalance factor is ρ = 500. These choices follow that of [53]. For all datasets, we split the long-tailed training set into 80% training and 20% validation during the search phase (Figure 1b).
Dataset Splits	Yes	We split the dataset S into training ST and validation SV sets with n T and n V examples respectively. For all datasets, we split the long-tailed training set into 80% training and 20% validation during the search phase (Figure 1b). When using Algo. 1, we split the original training data into 50% training and 50% validation.
Hardware Specification	No	The paper mentions using ResNet-32 and ResNet-50 models and SGD optimization, but it does not specify any particular hardware components like GPU models, CPU types, or memory amounts used for the experiments.
Software Dependencies	No	The paper mentions using standard mini-batch stochastic gradient descent (SGD) and Auto Augment policies, but it does not provide specific version numbers for any software libraries, frameworks (e.g., TensorFlow, PyTorch), or programming languages used.
Experiment Setup	Yes	In both CIFAR datasets, the lower-level optimization trains a Res Net-32 model with standard mini-batch stochastic gradient decent (SGD) using learning rate 0.1, momentum 0.9, and weight decay 1e 4, over 300 epochs. The learning rate decays at epochs 220 and 260 with a factor 0.1. [...] using SGD with initial learning rate 0.05, momentum 0.9, and weight decay 1e 4, we follow the same learning rate decay at epoch 220 and 260. [...] For Image Net-LT and i Naturalist, following previous work [53], we use Res Net-50 and SGD for the lower and upper optimizations, For the learning rate scheduling, we use cosine scheduling starting with learning rate 0.05, and batch size 128. In searching phase, we conduct 150 epoch training with 40 epoch warm-up before the loss function design starts. For the retraining phase, we train for 90 epochs, which is the same as [53] but only due to the lack of training resources we change the batch size to 128 and adjust initial learning rate accordingly as suggested by [22].