Learning to Reject Meets Long-tail Learning

Authors: Harikrishna Narasimhan, Aditya Krishna Menon, Wittawat Jitkrittum, Neha Gupta, Sanjiv Kumar

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments on benchmark image classification tasks, we show that our approach yields better trade-offs in both the balanced and worst-group error compared to L2R baselines. and We present experiments on long-tailed image classification tasks to showcase that proposed plug-in approaches for the balanced error ( 4) and the worst-group error ( 5) yield significantly better trade-offs than Chow s rule, despite using the same base model, and are competitive with variants of Chow s rule which require re-training the base model with a modified loss.
Researcher Affiliation Industry Harikrishna Narasimhan, Aditya Krishna Menon, Wittawat Jitkrittum Neha Gupta, Sanjiv Kumar Google Research {hnarasimhan, adityakmenon, wittawat, nehagup, sanjivk}@google.com
Pseudocode Yes Algorithm 1 Cost-sensitive Plug-in (CS-plug-in) and Algorithm 2 Worst-group Plug-in
Open Source Code No No explicit statement about providing open-source code for the described methodology was found.
Open Datasets Yes We use long-tailed versions of CIFAR-100 (Krizhevsky, 2009), Image Net (Deng et al., 2009) and i Naturalist (Van Horn et al., 2018).
Dataset Splits Yes Furthermore, we hold out 20% of the original test set as a validation sample and use the remaining as the test sample.
Hardware Specification No The paper mentions training models like Res Net-32/50 but provides no specific hardware details such as GPU/CPU models, memory, or cloud instances used for computation.
Software Dependencies No The paper mentions using SGD and specific hyperparameters but does not list any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes We summarise the hyper-parameter choices in Table 3 below. For CIFAR-100, we apply a warm up with a linear learning rate for 15 steps until we reach the base learning rate. We apply a learning rate decay of 0.1 at the 96th, 192nd and 224th epochs.