Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization

Authors: Yuhang Zang, Hanlin Goh, Joshua M. Susskind, Chen Huang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments validate that our method yields convincing gains in OOD generalization performance in different settings. Code: https://github.com/apple/ml-ogen. Table 1 summarizes the results on 11 datasets. Our ablation studies are conducted using OGEN-Co Op with a meaningfully long learning schedule.
Researcher Affiliation Collaboration Yuhang Zang 1, Hanlin Goh2, Josh Susskind2, Chen Huang2 1Nanyang Technological University 2Apple Inc.
Pseudocode No The paper describes its methods using mathematical equations and textual explanations, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Code: https://github.com/apple/ml-ogen.
Open Datasets Yes For both settings we use 11 datasets: Image Net (Deng et al., 2009), Caltech101 (Fei-Fei et al., 2004), Oxford Pets (Parkhi et al., 2012), Stanford Cars (Krause et al., 2013), Flowers102 (Nilsback & Zisserman, 2008), Food101 (Bossard et al., 2014), FGVC-Aircraft (Maji et al., 2013), SUN397 (Xiao et al., 2010), UCF101 (Soomro et al., 2012), DTD (Cimpoi et al., 2014) and Euro SAT (Helber et al., 2019).
Dataset Splits No The paper mentions 'base and new class splits are used for finetuning and evaluation respectively' and 'train/test data splitting'. While 'finetuning' often implies a validation set, the paper does not explicitly state the specific percentages or methodology for a 'validation' split or how it was used in detail for reproducibility.
Hardware Specification No The paper does not explicitly mention the specific hardware (e.g., GPU models, CPU types, or cloud computing instances with their specifications) used for running the experiments.
Software Dependencies No The paper mentions using 'CLIP (Radford et al., 2021)' but does not provide specific version numbers for any software libraries or dependencies (e.g., PyTorch version, TensorFlow version, or other numerical libraries with their versions).
Experiment Setup Yes Co Op is particularly interesting since its default learning schedule (200 epochs) is much longer than that of Co Co Op and VPT (10 epochs). For fairness, we use the same implementation details of each baseline, including the prompt length, vision backbone in CLIP (Radford et al., 2021) (i.e., Vi T-B/16) and train/test data splitting. The reported results are an average over three random seeds.