reproducibilityindex.ai

Controllable Prompt Tuning For Balancing Group Distributional Robustness

Authors: Hoang Phan, Andrew Gordon Wilson, Qi Lei

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct intensive experiments on many different benchmarks, where CPT consistently exhibits superior performance. Our method surpasses the current state-of-the-art baselines on Waterbirds andimprove the performance Celeb A datasets while updating 0.4% parameters. Moreover, it also outperforms recent proposed methods that aims to de-bias Vision Transformer and CLIP models with minimal training cost. In this section, we evaluate the effectiveness of the proposed CPT method on benchmark image datasets in the presence of spurious features: Waterbirds (Sagawa et al., 2019), Celeb A (Liu et al., 2015), Meta Shift (Liang & Zou, 2021) and ISIC (Codella et al., 2019).
Researcher Affiliation	Academia	1New York University. Correspondence to: Hoang Phan <hvp2011@nyu.edu>.
Pseudocode	No	The paper describes the proposed method in Section 4 using mathematical formulations and descriptive text, but it does not include a clearly labeled pseudocode block or algorithm.
Open Source Code	Yes	Our implementation is available at https://github.com/VietHoang1512/CPT.
Open Datasets	Yes	We evaluate the effectiveness of the proposed CPT method on benchmark image datasets in the presence of spurious features: Waterbirds (Sagawa et al., 2019), Celeb A (Liu et al., 2015), Meta Shift (Liang & Zou, 2021) and ISIC (Codella et al., 2019).
Dataset Splits	Yes	The plots demonstrate that while Group DRO can improve the performance of the minority group (green line) early in training compared to ERM, it rapidly overfits and fails to maintain this performance over time. After training for ten epochs, the test performance gap between the minority and majority (blue line) groups grows sharply. Vertical lines indicate early stopping epochs as models obtain the best performance on the validation set. We are given a training dataset composed of K groups from the set G, where each group g G consists of ng instances sampled from the probability distribution Pg(X, Y). # Train data 3, 498(73%) 184(4%) 56(1%) 1, 057(22%) # Val data 467 466 133 133 # Test data 2, 255 2, 255 642 642
Hardware Specification	Yes	Figure 7: Total running time and the time for solving the linear programming (LP) problem on epoch with Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz and NVIDIA A100-SXM4-80GB GPU. Results are averaged over 5 epochs.
Software Dependencies	No	The paper mentions 'CLIP (Radford et al., 2021)' and 'Res Net50 (He et al., 2016)', 'Vi T-B/16 (Dosovitskiy et al., 2020)' as models used, but does not provide specific version numbers for these software components or any other libraries like PyTorch, TensorFlow, or Python itself. It cites 'Cvxopt: Convex optimization. Astrophysics Source Code Library, pp. ascl 2008, 2020.' but this is a citation to a software paper, not an explicit dependency statement with a version for their own implementation.
Experiment Setup	Yes	Table 6: Hyperparameter for different experiments throughout our paper. We report the hyper-parameters selected for our proposed method after performing grid-search. It lists specific values for Learning Rate, Weight Decay, Batch Size, # Epochs, and Prompt length for different datasets and architectures.