reproducibilityindex.ai

Long-tail learning via logit adjustment

Authors: Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We now present experiments conﬁrming our main claims: (i) on simple binary problems, existing weight normalisation and loss modiﬁcation techniques may not converge to the optimal solution ( 6.1); (ii) on real-world datasets, our post-hoc logit adjustment generally outperforms weight normalisation, and one can obtain further gains via our logit adjusted softmax cross-entropy ( 6.2). We present results on the CIFAR-10, CIFAR-100, Image Net and i Naturalist 2018 datasets.
Researcher Affiliation	Industry	Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain Andreas Veit & Sanjiv Kumar Google Research New York, NY {adityakmenon,sadeep,ankitsrawat,himj,aveit,sanjivk}@google.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	A reference implementation of our methods is planned for release at: https://github.com/google-research/google-research/tree/master/logit_adjustment.
Open Datasets	Yes	We present results on the CIFAR-10, CIFAR-100, Image Net and i Naturalist 2018 datasets. Following prior work, we create long-tailed versions of the CIFAR datasets by suitably downsampling examples per label following the EXP proﬁle of Cui et al. (2019); Cao et al. (2019) with imbalance ratio ρ = maxy P(y)/miny P(y) = 100. Similarly, we use the long-tailed version of Image Net produced by Liu et al. (2019).
Dataset Splits	No	The paper mentions training on datasets and evaluating on a test set, but does not explicitly state the use of a separate validation split with specific details for reproducibility, instead mentioning 'holdout calibration' and tuning 'via cross-validation against the balanced error on the training set' as possibilities.
Hardware Specification	No	The paper specifies model architectures (e.g., ResNet-32, ResNet-50, ResNet-152) and training parameters like batch size, but does not explicitly describe the specific hardware (CPU, GPU, TPU models) used for running the experiments.
Software Dependencies	No	The paper mentions using SGD with momentum and certain model architectures (ResNet), but does not specify version numbers for any software dependencies like deep learning frameworks (e.g., PyTorch, TensorFlow) or programming languages.
Experiment Setup	Yes	Unless otherwise speciﬁed, all networks are trained with SGD with a momentum value of 0.9, a linear learning rate warm-up in the ﬁrst 5 epochs to reach the base learning rate, and a weight decay of 10 4. Other dataset speciﬁc details are given below. CIFAR-10 and CIFAR-100: We use a CIFAR Res Net-32 model trained for 120K steps with a batch size of 128. The base learning rate is 0.1, with a linear warmup for the ﬁrst 2000 steps, and a decay of 0.1 at 60K, 90K, and 110K steps. Image Net: We use a Res Net-50 model trained for 90 epochs with batch size of 512. The base learning rate is 0.4, with cosine learning rate decay and Nesterov momentum. We use a weight decay of 5 10 4 following Kang et al. (2020). i Naturalist: We use a Res Net-50 trained for 90 epochs with a batch size of 1024. The base learning rate is 0.4, with cosine learning rate decay.