Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization
Authors: Yuhang Zang, Hanlin Goh, Joshua M. Susskind, Chen Huang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments validate that our method yields convincing gains in OOD generalization performance in different settings. Code: https://github.com/apple/ml-ogen. Table 1 summarizes the results on 11 datasets. Our ablation studies are conducted using OGEN-Co Op with a meaningfully long learning schedule. |
| Researcher Affiliation | Collaboration | Yuhang Zang 1, Hanlin Goh2, Josh Susskind2, Chen Huang2 1Nanyang Technological University 2Apple Inc. |
| Pseudocode | No | The paper describes its methods using mathematical equations and textual explanations, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code: https://github.com/apple/ml-ogen. |
| Open Datasets | Yes | For both settings we use 11 datasets: Image Net (Deng et al., 2009), Caltech101 (Fei-Fei et al., 2004), Oxford Pets (Parkhi et al., 2012), Stanford Cars (Krause et al., 2013), Flowers102 (Nilsback & Zisserman, 2008), Food101 (Bossard et al., 2014), FGVC-Aircraft (Maji et al., 2013), SUN397 (Xiao et al., 2010), UCF101 (Soomro et al., 2012), DTD (Cimpoi et al., 2014) and Euro SAT (Helber et al., 2019). |
| Dataset Splits | No | The paper mentions 'base and new class splits are used for finetuning and evaluation respectively' and 'train/test data splitting'. While 'finetuning' often implies a validation set, the paper does not explicitly state the specific percentages or methodology for a 'validation' split or how it was used in detail for reproducibility. |
| Hardware Specification | No | The paper does not explicitly mention the specific hardware (e.g., GPU models, CPU types, or cloud computing instances with their specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'CLIP (Radford et al., 2021)' but does not provide specific version numbers for any software libraries or dependencies (e.g., PyTorch version, TensorFlow version, or other numerical libraries with their versions). |
| Experiment Setup | Yes | Co Op is particularly interesting since its default learning schedule (200 epochs) is much longer than that of Co Co Op and VPT (10 epochs). For fairness, we use the same implementation details of each baseline, including the prompt length, vision backbone in CLIP (Radford et al., 2021) (i.e., Vi T-B/16) and train/test data splitting. The reported results are an average over three random seeds. |