Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification

Authors: Wenshuo Peng, Kaipeng Zhang, Yue Yang, Hao Zhang, Yu Qiao

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments that show our proposed DAT approach meaningfully improves various benchmark datasets performance over traditional adaptation methods by simply.
Researcher Affiliation Collaboration 1Open GVLab, Shanghai AI Laboratory 2Shanghai Jiao Tong University 3Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University
Pseudocode No The paper includes figures (e.g., Figure 1, 2, 3, 4) that illustrate the proposed modules and pipelines. However, it does not contain any formal pseudocode blocks or sections explicitly labeled as 'Algorithm'.
Open Source Code No The paper does not provide an explicit statement about releasing open-source code or a link to a code repository for the described methodology.
Open Datasets Yes We evaluate our method on eight widely-used image classification benchmarks, including CIFAR10 (Krizhevsky, Hinton et al. 2009), CIFAR100 (Krizhevsky, Hinton et al. 2009), FGVA-aircraft (Maji et al. 2013), Oxford Pets (Parkhi et al. 2012), Stanford Cars (Krause et al. 2013), DTD (Cimpoi et al. 2014), Food101 (Bossard, Guillaumin, and Gool 2014), and SUN397 (Xiao et al. 2010). ... a Res Net-50 (He et al. 2016) image encoder pre-trained on YFCC15M (Thomee et al. 2016).
Dataset Splits No The paper does not explicitly provide specific train/validation/test dataset splits (e.g., percentages or counts) needed to reproduce the experiment. It mentions using 'downstream labeled data' and 'pre-training bank data' but not how the overall dataset was split into these phases for training and validation.
Hardware Specification No The paper mentions using models like 'Res Net-50' and 'Vi T-B-32' but does not specify any hardware details such as GPU models, CPU specifications, or memory used for running the experiments.
Software Dependencies No The paper mentions software components like 'CLIP' and 'Open Clip' as foundation models/frameworks but does not provide specific version numbers for these or any other software dependencies required to replicate the experiments.
Experiment Setup Yes During training, we set λ = 1 with Tthresh = 0.95 due to the high noise level of the pre-training data and we also set η = 1. In our experiment, we take fine-tuning as an example to show our test results. We train 12 epochs because of the fast convergence speed of the pre-trained model.