reproducibilityindex.ai

Federated Text-driven Prompt Generation for Vision-Language Models

Authors: Chen Qiu, Xingyu Li, Chaithanya Kumar Mummadi, Madan Ravi Ganesh, Zhenzhen Li, Lu Peng, Wan-Yi Lin

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our comprehensive empirical evaluations on nine diverse image classification datasets show that our method is superior to existing federated prompt learning methods, achieving better overall generalization on both seen and unseen classes, as well as datasets.
Researcher Affiliation	Collaboration	Chen Qiu Bosch Center for AI, USA Xingyu Li Tulane University Chaithanya Kumar Mummadi & Madan Ravi Ganesh & Zhenzhen Li Bosch Center for AI, USA Lu Peng Tulane University Wan-Yi Lin Bosch Center for AI, USA
Pseudocode	Yes	Algorithm 1: Fed TPG Algorithm
Open Source Code	No	The paper does not provide any explicit statement or link to open-source code for the methodology described.
Open Datasets	Yes	We employ nine image datasets including Caltech101 (Fei-Fei et al., 2004), Oxford Pets (Parkhi et al., 2012), Stanford Cars (Krause et al., 2013), Flowers102 (Nilsback & Zisserman, 2008), Food101 (Bossard et al., 2014), FGVCAircraft (Maji et al., 2013), SUN397 (Xiao et al., 2010), UCF101 (Soomro et al., 2012), and DTD (Cimpoi et al., 2014). For evaluating the generalization to unseen datasets, we train all models on Image Net, and test the model on two benchmarks: (1) four variants of Image Net including Image Net V2, Image Net-Sketch, Image Net-A, and Image Net-R; (2) ten unseen datasets including nine datasets used in Table 1 and Euro SAT (Helber et al., 2019).
Dataset Splits	Yes	We split the classes of each dataset equally into two groups, one as base classes and the other as new classes. Images from base classes are available for training, while the images from new classes are used for evaluating the generalization performance. We report the classification accuracies on clients local classification tasks, on the base classes (combining classes from multiple clients), on the new classes in Table 1. We report the harmonic mean (HM) of these three accuracies showing the overall performance. All results are averaged over three independent runs. In the FL data partition process for Table 1, we first split the classes of the considered 9 classification datasets equally into two groups Ds and Du, denotes seen and unseen groups respectively. Then we split the classes within Ds to the 30 remote clients, where each remote client has n = 20 classes in each local dataset Di. For each class, the number of image-text paired data shots is set to 8.
Hardware Specification	No	The paper states: 'All methods are built on a frozen CLIP with Vi T-B/16 backbone.' This refers to the model architecture used, not the specific hardware (GPU, CPU, memory) on which the experiments were run. No other hardware specifications are provided.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies. It mentions using 'SGD optimizer' but no version details for frameworks like PyTorch or TensorFlow, or programming languages like Python.
Experiment Setup	Yes	Implementation Details. All methods are built on a frozen CLIP with Vi T-B/16 backbone. Fed TPG learns a unified prompt generator parameterized by a four-head cross-attention layer with layer norm and a MLP (hϕ) consisting of two linear layers with Re LU. The dimension of vectors Q, KT , VT in the cross-attention layer, and linear layers in hϕ is 512. The length m of prompt vectors is 4, and the dimension d is 512. For each compared FL approach and each classification task, via grid search, the learning rate of the SGD optimizer was set to η = 0.003 with a decay rate 1e 5 and a momentum of 0.9. The local SGD training step is set to K = 1. The number of communication rounds is 500. The batch size is 200. By default, all the experimental results in the paper are obtained by averaging from three independent runs.