Long-tail learning via logit adjustment

Authors: Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now present experiments confirming our main claims: (i) on simple binary problems, existing weight normalisation and loss modification techniques may not converge to the optimal solution ( 6.1); (ii) on real-world datasets, our post-hoc logit adjustment generally outperforms weight normalisation, and one can obtain further gains via our logit adjusted softmax cross-entropy ( 6.2). We present results on the CIFAR-10, CIFAR-100, Image Net and i Naturalist 2018 datasets.
Researcher Affiliation Industry Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain Andreas Veit & Sanjiv Kumar Google Research New York, NY {adityakmenon,sadeep,ankitsrawat,himj,aveit,sanjivk}@google.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No A reference implementation of our methods is planned for release at: https://github.com/google-research/google-research/tree/master/logit_adjustment.
Open Datasets Yes We present results on the CIFAR-10, CIFAR-100, Image Net and i Naturalist 2018 datasets. Following prior work, we create long-tailed versions of the CIFAR datasets by suitably downsampling examples per label following the EXP profile of Cui et al. (2019); Cao et al. (2019) with imbalance ratio ρ = maxy P(y)/miny P(y) = 100. Similarly, we use the long-tailed version of Image Net produced by Liu et al. (2019).
Dataset Splits No The paper mentions training on datasets and evaluating on a test set, but does not explicitly state the use of a separate validation split with specific details for reproducibility, instead mentioning 'holdout calibration' and tuning 'via cross-validation against the balanced error on the training set' as possibilities.
Hardware Specification No The paper specifies model architectures (e.g., ResNet-32, ResNet-50, ResNet-152) and training parameters like batch size, but does not explicitly describe the specific hardware (CPU, GPU, TPU models) used for running the experiments.
Software Dependencies No The paper mentions using SGD with momentum and certain model architectures (ResNet), but does not specify version numbers for any software dependencies like deep learning frameworks (e.g., PyTorch, TensorFlow) or programming languages.
Experiment Setup Yes Unless otherwise specified, all networks are trained with SGD with a momentum value of 0.9, a linear learning rate warm-up in the first 5 epochs to reach the base learning rate, and a weight decay of 10 4. Other dataset specific details are given below. CIFAR-10 and CIFAR-100: We use a CIFAR Res Net-32 model trained for 120K steps with a batch size of 128. The base learning rate is 0.1, with a linear warmup for the first 2000 steps, and a decay of 0.1 at 60K, 90K, and 110K steps. Image Net: We use a Res Net-50 model trained for 90 epochs with batch size of 512. The base learning rate is 0.4, with cosine learning rate decay and Nesterov momentum. We use a weight decay of 5 10 4 following Kang et al. (2020). i Naturalist: We use a Res Net-50 trained for 90 epochs with a batch size of 1024. The base learning rate is 0.4, with cosine learning rate decay.