CLIPCEIL: Domain Generalization through CLIP via Channel rEfinement and Image-text aLignment

Authors: Xi Yu, Shinjae Yoo, Yuewei Lin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on five widely used benchmark datasets demonstrate that CLIPCEIL outperforms the existing state-of-the-art methods.
Researcher Affiliation Academia Xi Yu, Shinjae Yoo, Yuewei Lin Artificial Intelligence Department, Computing and Data Science Directorate Brookhaven National Laboratory, Upton, NY 11973 {xyu1; sjyoo; ywlin}@bnl.gov
Pseudocode Yes Algorithm 1 Training Procedure of CLIPCEIL
Open Source Code Yes The source code is available at https://github.com/yuxi120407/CLIPCEIL.
Open Datasets Yes We evaluate our proposed method on five standard DG benchmarks: PACS [28] contains 9991 images of 7 categories from 4 domains; VLCS [48] comprises 5 categories from 4 domains, 10,729 images in total; Office Home [52] contains 15,579 images of 65 categories from 4 domains; Terra Incognita [2] contains 24,788 images with 10 categories from 4 domains; Domain Net [38] is a more recent and the largest one among all five datasets, which contains 0.6 million images in 345 categories from 6 domains.
Dataset Splits Yes In all experiments, we use the open-source code Domain Bed [16] and follow the train-validate-test split of each dataset on the Domain Bed benchmark. ... Our model is selected based on the source domain validation set.
Hardware Specification Yes All experiments are conducted on the NVIDIA A100 GPUs. ... All experiments are conducted on a GPU server equipped with 4 NVIDIA A100-SXM4-80GB GPUs, although only 2 were used for this paper. The server also has an Intel Xeon Gold 6336Y CPU @ 2.40GHz with 24 cores and 48 threads, 1 TB of memory.
Software Dependencies Yes Our CLIPCEIL model is implemented and evaluated with Python 3.8.13, Py Torch 1.8.0, Torchvision 0.9.0, and CUDA 11.1.
Experiment Setup Yes We fixed the image and text encoders and solely trained adapter g during training. ... Following the literature, we train our model with 5000 iterations on PACS, VLCS, Office Home, and Terra Incognia datasets and 15000 iterations on the Domain Net dataset. ... Our optimizer is Adam W [34] with a weight decay of 5e 4, and the learning rate is initialized to 5e 5, gradually decreasing by using the cosine annealing scheduler. We adopt a batch size of 32 for all datasets, and all images are randomly resized and cropped to 224 224.