reproducibilityindex.ai

CLIPCEIL: Domain Generalization through CLIP via Channel rEfinement and Image-text aLignment

Authors: Xi Yu, Shinjae Yoo, Yuewei Lin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on five widely used benchmark datasets demonstrate that CLIPCEIL outperforms the existing state-of-the-art methods.
Researcher Affiliation	Academia	Xi Yu, Shinjae Yoo, Yuewei Lin Artificial Intelligence Department, Computing and Data Science Directorate Brookhaven National Laboratory, Upton, NY 11973 {xyu1; sjyoo; ywlin}@bnl.gov
Pseudocode	Yes	Algorithm 1 Training Procedure of CLIPCEIL
Open Source Code	Yes	The source code is available at https://github.com/yuxi120407/CLIPCEIL.
Open Datasets	Yes	We evaluate our proposed method on five standard DG benchmarks: PACS [28] contains 9991 images of 7 categories from 4 domains; VLCS [48] comprises 5 categories from 4 domains, 10,729 images in total; Office Home [52] contains 15,579 images of 65 categories from 4 domains; Terra Incognita [2] contains 24,788 images with 10 categories from 4 domains; Domain Net [38] is a more recent and the largest one among all five datasets, which contains 0.6 million images in 345 categories from 6 domains.
Dataset Splits	Yes	In all experiments, we use the open-source code Domain Bed [16] and follow the train-validate-test split of each dataset on the Domain Bed benchmark. ... Our model is selected based on the source domain validation set.
Hardware Specification	Yes	All experiments are conducted on the NVIDIA A100 GPUs. ... All experiments are conducted on a GPU server equipped with 4 NVIDIA A100-SXM4-80GB GPUs, although only 2 were used for this paper. The server also has an Intel Xeon Gold 6336Y CPU @ 2.40GHz with 24 cores and 48 threads, 1 TB of memory.
Software Dependencies	Yes	Our CLIPCEIL model is implemented and evaluated with Python 3.8.13, Py Torch 1.8.0, Torchvision 0.9.0, and CUDA 11.1.
Experiment Setup	Yes	We fixed the image and text encoders and solely trained adapter g during training. ... Following the literature, we train our model with 5000 iterations on PACS, VLCS, Office Home, and Terra Incognia datasets and 15000 iterations on the Domain Net dataset. ... Our optimizer is Adam W [34] with a weight decay of 5e 4, and the learning rate is initialized to 5e 5, gradually decreasing by using the cosine annealing scheduler. We adopt a batch size of 32 for all datasets, and all images are randomly resized and cropped to 224 224.