Selective Mixup Fine-Tuning for Optimizing Non-Decomposable Objectives

Authors: Shrinivas Ramasubramanian, Harsh Rangwani, Sho Takemori, Kunal Samanta, Yuhei Umeda, Venkatesh Babu Radhakrishnan

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We comprehensively evaluate our technique against the existing empirical and theoretically principled methods on standard benchmark datasets for imbalanced classification. We find that proposed Sel Mix fine-tuning significantly improves the performance for various practical non-decomposable objectives across benchmarks.
Researcher Affiliation Collaboration Shrinivas Ramasubramanian 1, Harsh Rangwani 2, Sho Takemori 3, Kunal Samanta2, Yuhei Umeda3, R. Venkatesh Babu2 1Fujitsu Research of India, 2Indian Institute of Science, 3Fujitsu Limited
Pseudocode Yes Algorithm 1 Training through Sel Mix
Open Source Code Yes Code will be available here: github.com/val-iisc/Sel Mix/.
Open Datasets Yes We comprehensively evaluate our technique against the existing empirical and theoretically principled methods on standard benchmark datasets for imbalanced classification. We find that proposed Sel Mix fine-tuning significantly improves the performance for various practical non-decomposable objectives across benchmarks. For the experiments on the long-tailed supervised dataset, we consider the Long-Tailed versions of CIFAR-10, 100, and Image Net-1k.
Dataset Splits Yes We comprehensively evaluate our technique against the existing empirical and theoretically principled methods on standard benchmark datasets for imbalanced classification. For the experiments on the long-tailed supervised dataset, we consider the Long-Tailed versions of CIFAR-10, 100, and Image Net-1k. The parameters for the datasets are available in Tab. G.1.
Hardware Specification Yes The experiments were done on Nvidia A5000 GPU (24 GB). While the fine-tuning was done on a single A5000, the pre-training was done using Py Torch data parallel on 4XA5000.
Software Dependencies No The paper mentions software like "Py Torch" and specific models like "Wide Res Net-282", "Fix Match", "SGD optimizer", and "logit adjusted (LA) cross-entropy loss", but does not provide specific version numbers for these software components.
Experiment Setup Yes Training Details: Our classifier comprises a feature extractor g : X Rd and a linear layer with weight W (see Sec. 3). In semi-supervised learning, we use the pre-trained Wide Res Net-282 (Zagoruyko & Komodakis, 2016) with Fix Match (Sohn et al., 2020), replacing the loss function with the logit adjusted (LA) cross-entropy loss (Menon et al., 2020) for debiased pseudo-labels. Fine-tuning with Sel Mix (Alg. 1) includes cosine learning rate and SGD optimizer. In supervised learning, we pre-train models with Mi SLAS on Res Net-32 for CIFAR-10, CIFAR-100, and Res Net-50 for Image Net-1k. We freeze batch norm layers and fine-tune the feature extractor with a low learning rate to maintain stable mean feature statistics zk, as per our theoretical findings. Further details and hyperparameters are provided in appendix Table G.1.