Long-tail learning via logit adjustment
Authors: Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now present experiments confirming our main claims: (i) on simple binary problems, existing weight normalisation and loss modification techniques may not converge to the optimal solution ( 6.1); (ii) on real-world datasets, our post-hoc logit adjustment generally outperforms weight normalisation, and one can obtain further gains via our logit adjusted softmax cross-entropy ( 6.2). We present results on the CIFAR-10, CIFAR-100, Image Net and i Naturalist 2018 datasets. |
| Researcher Affiliation | Industry | Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain Andreas Veit & Sanjiv Kumar Google Research New York, NY {adityakmenon,sadeep,ankitsrawat,himj,aveit,sanjivk}@google.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | A reference implementation of our methods is planned for release at: https://github.com/google-research/google-research/tree/master/logit_adjustment. |
| Open Datasets | Yes | We present results on the CIFAR-10, CIFAR-100, Image Net and i Naturalist 2018 datasets. Following prior work, we create long-tailed versions of the CIFAR datasets by suitably downsampling examples per label following the EXP profile of Cui et al. (2019); Cao et al. (2019) with imbalance ratio ρ = maxy P(y)/miny P(y) = 100. Similarly, we use the long-tailed version of Image Net produced by Liu et al. (2019). |
| Dataset Splits | No | The paper mentions training on datasets and evaluating on a test set, but does not explicitly state the use of a separate validation split with specific details for reproducibility, instead mentioning 'holdout calibration' and tuning 'via cross-validation against the balanced error on the training set' as possibilities. |
| Hardware Specification | No | The paper specifies model architectures (e.g., ResNet-32, ResNet-50, ResNet-152) and training parameters like batch size, but does not explicitly describe the specific hardware (CPU, GPU, TPU models) used for running the experiments. |
| Software Dependencies | No | The paper mentions using SGD with momentum and certain model architectures (ResNet), but does not specify version numbers for any software dependencies like deep learning frameworks (e.g., PyTorch, TensorFlow) or programming languages. |
| Experiment Setup | Yes | Unless otherwise specified, all networks are trained with SGD with a momentum value of 0.9, a linear learning rate warm-up in the first 5 epochs to reach the base learning rate, and a weight decay of 10 4. Other dataset specific details are given below. CIFAR-10 and CIFAR-100: We use a CIFAR Res Net-32 model trained for 120K steps with a batch size of 128. The base learning rate is 0.1, with a linear warmup for the first 2000 steps, and a decay of 0.1 at 60K, 90K, and 110K steps. Image Net: We use a Res Net-50 model trained for 90 epochs with batch size of 512. The base learning rate is 0.4, with cosine learning rate decay and Nesterov momentum. We use a weight decay of 5 10 4 following Kang et al. (2020). i Naturalist: We use a Res Net-50 trained for 90 epochs with a batch size of 1024. The base learning rate is 0.4, with cosine learning rate decay. |