Realistic Unsupervised CLIP Fine-tuning with Universal Entropy Optimization

Authors: Jian Liang, Lijun Sheng, Zhengbo Wang, Ran He, Tieniu Tan

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across 15 domains and 4 different types of prior knowledge validate the effectiveness of UEO compared to baseline methods. The code is at https: //github.com/tim-learn/UEO. ... 4. Experiments
Researcher Affiliation Academia 1NLPR & MAIS, Institute of Automation, Chinese Academy of Sciences, China 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3University of Science and Technology of China 4Nanjing University.
Pseudocode Yes A. Pseudo code To facilitate a better understanding of our problem setup and proposed method (UEO), we provide the pseudocode below. Throughout this paper, we utilize two different scores, namely ACC and AUC, calculated in Line 15 and Line 17, respectively, to evaluate all methods under the U2-FT framework. Algorithm 1 Universal Entropy Optimization (UEO) for Unsupervised Universal Fine-Tuning (U2-FT)
Open Source Code Yes The code is at https: //github.com/tim-learn/UEO.
Open Datasets Yes Datasets. We employ four popular domain adaptation datasets, i.e., Office (Saenko et al., 2010) that comprises 3 domains of 31 object categories, Office Home (Venkateswara et al., 2017) that encompasses 65 categories across 4 domains, Vis DA-C (Peng et al., 2017) that includes 2 distant domains of 12 classes, and Domain Net (DN) (Peng et al., 2019) that contains 345 classes distributed in 6 styles. and To validate the effectiveness of the proposed method (UEO), we additionally utilize two widely recognized classification datasets (i.e., Image Net, and Food101) and present the results under four category shifts in Table 9.
Dataset Splits Yes Unlike DANCE (Saito et al., 2020), which assesses accuracies using training unlabeled data, we opt for an independent test set comprising both ID and OOD samples. ... F. Information of data split for different shifts In this section, we present specific information about Lp (the target class of interest), Lu (the label set of the training data), and Le (the label set of the evaluation data) for all datasets. Table 11. Detailed information about four category shifts in the training stage and evaluation stage.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU types, or cloud computing instance specifications.
Software Dependencies No The paper mentions using 'pre-trained Res Net-50 and Vi T-B/16 models provided by the official CLIP repository' and an 'SGD optimizer', but does not specify version numbers for programming languages or software libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Implementation details. For all experiments, we utilize the pre-trained Res Net-50 and Vi T-B/16 models provided by the official CLIP repository (Radford et al., 2021). The epoch number is set to 50 for small-size datasets (i.e., Office and Office Home) and 5 for large-size datasets (i.e., Vis DA-C and Domain Net), and the learned model in the last epoch is chosen for a fair evaluation. During training, we use an SGD optimizer with an initial learning rate of 1e-4 for both encoders, except for Ent Min, which uses a learning rate of 1e-5. We also employ a cosine scheduler to gradually decrease the learning rate. The parameters optimized in all methods include the prompt of the text encoder and affine parameters in the normalization layers (i.e., Batch Norm in Res Net-50 and Layer Norm in Vi T-B/16) of the visual encoder. The context length of the prompt is fixed at 4 and takes the default initialization a photo of a . We reproduce all methods loss functions using the hyperparameters provided in their respective papers with the experimental results averaged over two different seeds.