reproducibilityindex.ai

Debiased Fine-Tuning for Vision-Language Models by Prompt Regularization

Authors: Beier Zhu, Yulei Niu, Saeil Lee, Minhoe Hur, Hanwang Zhang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive evaluations for Pro Reg on various out-of-distribution benchmarks, including BAR (Nam et al. 2020), NICO (He, Shen, and Cui 2020), PACS (Li et al. 2017) and Domain Net (Peng et al. 2019) for image classification tasks and VQA-CP (Agrawal et al. 2018) for visual question answering tasks. We demonstrate that: 1) Pro Reg consistently outperforms zero-shot prompt, conventional fine-tuning, and prompt tuning on all the datasets, 2) Pro Reg achieves compelling performance in both out-of-distribution and in-distribution settings.
Researcher Affiliation	Collaboration	Beier Zhu1, Yulei Niu2*, Saeil Lee3, Minhoe Hur4, Hanwang Zhang1 1 Nanyang Technological University 2 Columbia University 3 HMGICS AIR Center 4 AIRS Company, Hyundai Motor Group
Pseudocode	No	The paper provides mathematical formulations for loss functions but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	The paper does not contain an explicit statement about the release of its source code or a link to a code repository.
Open Datasets	Yes	We conduct extensive evaluations for Pro Reg on various out-of-distribution benchmarks, including BAR (Nam et al. 2020), NICO (He, Shen, and Cui 2020), PACS (Li et al. 2017) and Domain Net (Peng et al. 2019) for image classification tasks and VQA-CP (Agrawal et al. 2018) for visual question answering tasks.
Dataset Splits	Yes	PACS (Li et al. 2017) covers photo, sketch, cartoon and painting domains. The model is trained and validated on any three seen domains then tested on the rest unseen domain.
Hardware Specification	No	The paper does not specify any hardware details such as specific GPU models, CPU models, or cloud computing instances used for running the experiments.
Software Dependencies	No	The paper mentions using AdamW optimizer and refers to prior work for fine-tuning settings, but it does not specify software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	For ViLT-based models, we followed the original fine-tuning settings in (Kim, Son, and Kim 2021), which adopt the ViLT-B/32 model with AdamW (Loshchilov and Hutter 2018) optimizer for 10 epochs for all datasets. For CLIP-based models, we used the ViT-B/32 backbone and adopted the ViLT fine-tuning settings, including the training epoch, optimizer, warmup schedule and image pre-processing, etc. α is set to 2 for all experiments.