Selective Mixup Fine-Tuning for Optimizing Non-Decomposable Objectives
Authors: Shrinivas Ramasubramanian, Harsh Rangwani, Sho Takemori, Kunal Samanta, Yuhei Umeda, Venkatesh Babu Radhakrishnan
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We comprehensively evaluate our technique against the existing empirical and theoretically principled methods on standard benchmark datasets for imbalanced classification. We find that proposed Sel Mix fine-tuning significantly improves the performance for various practical non-decomposable objectives across benchmarks. |
| Researcher Affiliation | Collaboration | Shrinivas Ramasubramanian 1, Harsh Rangwani 2, Sho Takemori 3, Kunal Samanta2, Yuhei Umeda3, R. Venkatesh Babu2 1Fujitsu Research of India, 2Indian Institute of Science, 3Fujitsu Limited |
| Pseudocode | Yes | Algorithm 1 Training through Sel Mix |
| Open Source Code | Yes | Code will be available here: github.com/val-iisc/Sel Mix/. |
| Open Datasets | Yes | We comprehensively evaluate our technique against the existing empirical and theoretically principled methods on standard benchmark datasets for imbalanced classification. We find that proposed Sel Mix fine-tuning significantly improves the performance for various practical non-decomposable objectives across benchmarks. For the experiments on the long-tailed supervised dataset, we consider the Long-Tailed versions of CIFAR-10, 100, and Image Net-1k. |
| Dataset Splits | Yes | We comprehensively evaluate our technique against the existing empirical and theoretically principled methods on standard benchmark datasets for imbalanced classification. For the experiments on the long-tailed supervised dataset, we consider the Long-Tailed versions of CIFAR-10, 100, and Image Net-1k. The parameters for the datasets are available in Tab. G.1. |
| Hardware Specification | Yes | The experiments were done on Nvidia A5000 GPU (24 GB). While the fine-tuning was done on a single A5000, the pre-training was done using Py Torch data parallel on 4XA5000. |
| Software Dependencies | No | The paper mentions software like "Py Torch" and specific models like "Wide Res Net-282", "Fix Match", "SGD optimizer", and "logit adjusted (LA) cross-entropy loss", but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Training Details: Our classifier comprises a feature extractor g : X Rd and a linear layer with weight W (see Sec. 3). In semi-supervised learning, we use the pre-trained Wide Res Net-282 (Zagoruyko & Komodakis, 2016) with Fix Match (Sohn et al., 2020), replacing the loss function with the logit adjusted (LA) cross-entropy loss (Menon et al., 2020) for debiased pseudo-labels. Fine-tuning with Sel Mix (Alg. 1) includes cosine learning rate and SGD optimizer. In supervised learning, we pre-train models with Mi SLAS on Res Net-32 for CIFAR-10, CIFAR-100, and Res Net-50 for Image Net-1k. We freeze batch norm layers and fine-tune the feature extractor with a low learning rate to maintain stable mean feature statistics zk, as per our theoretical findings. Further details and hyperparameters are provided in appendix Table G.1. |