Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data

Authors: Harsh Rangwani, Sumukh K Aithal, Mayank Mishra, Venkatesh Babu R

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using SAM results in a 6.2% increase in accuracy on the minority classes over the state-of-the-art Vector Scaling Loss, leading to an overall average increase of 4% across imbalanced datasets. The code is available at https://github.com/val-iisc/Saddle-Long Tail. 5 Experiments
Researcher Affiliation Academia Harsh Rangwani Sumukh K Aithal Mayank Mishra R. Venkatesh Babu Video Analytics Lab, Indian Institute of Science, Bengaluru, India {harshr@iisc.ac.in, sumukhaithal6@gmail.com, mayankmishra@iisc.ac.in, venky@iisc.ac.in}
Pseudocode Yes Algorithm for DRW+SAM is defined in App. G.
Open Source Code Yes The code is available at https://github.com/val-iisc/Saddle-Long Tail.
Open Datasets Yes We report our results on four long-tailed datasets: CIFAR-10 LT [9], CIFAR-100 LT [9], Image Net-LT [34], and i Naturalist 2018 [44]. a) CIFAR-10 LT and CIFAR-100 LT: The original CIFAR-10 and CIFAR-100 datasets consist of 50,000 training images and 10,000 validation images, spread across 10 and 100 classes, respectively.
Dataset Splits Yes The original CIFAR-10 and CIFAR-100 datasets consist of 50,000 training images and 10,000 validation images, spread across 10 and 100 classes, respectively.
Hardware Specification No The main body of the paper does not explicitly state specific hardware details such as GPU models, CPU types, or memory amounts. It states 'Added in the Appendix' regarding compute resources, but the appendix content is not provided in the given text.
Software Dependencies No The paper mentions software components like 'Res Net-32 architecture' and 'SGD', but does not provide specific version numbers for any libraries, frameworks, or programming languages used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We follow the hyperparameters and setup as in Cao et al. [9] for CIFAR-10 LT and CIFAR-100 LT datasets. We train a Res Net-32 architecture as the backbone and SGD with a momentum of 0.9 as the base optimizer for 200 epochs. A multi-step learning rate schedule is used, which drops the learning rate by 0.01 and 0.0001 at the 160th and 180th epoch, respectively. For training with SAM, we set a constant ρ value of either 0.5 or 0.8 for most methods.