Debiased Fine-Tuning for Vision-Language Models by Prompt Regularization
Authors: Beier Zhu, Yulei Niu, Saeil Lee, Minhoe Hur, Hanwang Zhang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive evaluations for Pro Reg on various out-of-distribution benchmarks, including BAR (Nam et al. 2020), NICO (He, Shen, and Cui 2020), PACS (Li et al. 2017) and Domain Net (Peng et al. 2019) for image classification tasks and VQA-CP (Agrawal et al. 2018) for visual question answering tasks. We demonstrate that: 1) Pro Reg consistently outperforms zero-shot prompt, conventional fine-tuning, and prompt tuning on all the datasets, 2) Pro Reg achieves compelling performance in both out-of-distribution and in-distribution settings. |
| Researcher Affiliation | Collaboration | Beier Zhu1, Yulei Niu2*, Saeil Lee3, Minhoe Hur4, Hanwang Zhang1 1 Nanyang Technological University 2 Columbia University 3 HMGICS AIR Center 4 AIRS Company, Hyundai Motor Group |
| Pseudocode | No | The paper provides mathematical formulations for loss functions but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about the release of its source code or a link to a code repository. |
| Open Datasets | Yes | We conduct extensive evaluations for Pro Reg on various out-of-distribution benchmarks, including BAR (Nam et al. 2020), NICO (He, Shen, and Cui 2020), PACS (Li et al. 2017) and Domain Net (Peng et al. 2019) for image classification tasks and VQA-CP (Agrawal et al. 2018) for visual question answering tasks. |
| Dataset Splits | Yes | PACS (Li et al. 2017) covers photo, sketch, cartoon and painting domains. The model is trained and validated on any three seen domains then tested on the rest unseen domain. |
| Hardware Specification | No | The paper does not specify any hardware details such as specific GPU models, CPU models, or cloud computing instances used for running the experiments. |
| Software Dependencies | No | The paper mentions using AdamW optimizer and refers to prior work for fine-tuning settings, but it does not specify software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | For ViLT-based models, we followed the original fine-tuning settings in (Kim, Son, and Kim 2021), which adopt the ViLT-B/32 model with AdamW (Loshchilov and Hutter 2018) optimizer for 10 epochs for all datasets. For CLIP-based models, we used the ViT-B/32 backbone and adopted the ViLT fine-tuning settings, including the training epoch, optimizer, warmup schedule and image pre-processing, etc. α is set to 2 for all experiments. |