Learning to Reject Meets Long-tail Learning
Authors: Harikrishna Narasimhan, Aditya Krishna Menon, Wittawat Jitkrittum, Neha Gupta, Sanjiv Kumar
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments on benchmark image classification tasks, we show that our approach yields better trade-offs in both the balanced and worst-group error compared to L2R baselines. and We present experiments on long-tailed image classification tasks to showcase that proposed plug-in approaches for the balanced error ( 4) and the worst-group error ( 5) yield significantly better trade-offs than Chow s rule, despite using the same base model, and are competitive with variants of Chow s rule which require re-training the base model with a modified loss. |
| Researcher Affiliation | Industry | Harikrishna Narasimhan, Aditya Krishna Menon, Wittawat Jitkrittum Neha Gupta, Sanjiv Kumar Google Research {hnarasimhan, adityakmenon, wittawat, nehagup, sanjivk}@google.com |
| Pseudocode | Yes | Algorithm 1 Cost-sensitive Plug-in (CS-plug-in) and Algorithm 2 Worst-group Plug-in |
| Open Source Code | No | No explicit statement about providing open-source code for the described methodology was found. |
| Open Datasets | Yes | We use long-tailed versions of CIFAR-100 (Krizhevsky, 2009), Image Net (Deng et al., 2009) and i Naturalist (Van Horn et al., 2018). |
| Dataset Splits | Yes | Furthermore, we hold out 20% of the original test set as a validation sample and use the remaining as the test sample. |
| Hardware Specification | No | The paper mentions training models like Res Net-32/50 but provides no specific hardware details such as GPU/CPU models, memory, or cloud instances used for computation. |
| Software Dependencies | No | The paper mentions using SGD and specific hyperparameters but does not list any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | We summarise the hyper-parameter choices in Table 3 below. For CIFAR-100, we apply a warm up with a linear learning rate for 15 steps until we reach the base learning rate. We apply a learning rate decay of 0.1 at the 96th, 192nd and 224th epochs. |