Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
CLIPood: Generalizing CLIP to Out-of-Distributions
Authors: Yang Shu, Xingzhuo Guo, Jialong Wu, Ximei Wang, Jianmin Wang, Mingsheng Long
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques. |
| Researcher Affiliation | Collaboration | 1School of Software, BNRist, Tsinghua University. 2Institute for Interdisciplinary Information Sciences, Tsinghua University. 3Tencent Inc, China. |
| Pseudocode | Yes | Algorithm 1 Training Procedure of CLIPood |
| Open Source Code | Yes | Code is available at https://github.com/thuml/CLIPood. |
| Open Datasets | Yes | We use five multi-domain datasets in Domain Bed (Gulrajani & Lopez-Paz, 2021): PACS (Li et al., 2017), VLCS (Torralba & Efros, 2011), Office Home (Venkateswara et al., 2017), Terra Incognita (Beery et al., 2018) and Domain Net (Peng et al., 2019). |
| Dataset Splits | Yes | We follow the train-validate-test split of each dataset as the Domain Bed benchmark and the leave-one-out evaluation protocol, where at each time, one domain is chosen as the test domain for evaluating OOD generalization, and other domains are chosen as the training domains. |
| Hardware Specification | Yes | We use a machine with 32 CPUs, 256 GB memory, and the NVIDIA TITAN X GPU. |
| Software Dependencies | Yes | For the experiments, we use Py Torch 1.13.1, torchvision 0.14.1, and CUDA 11.6 libraries. |
| Experiment Setup | Yes | We keep the temperature of the softmax function the same as the pre-trained model as τ = 0.01, and use the same hyper-parameter λ = 0.3 for all datasets to avoid over-tuning on specific tasks. We adopt a batch size of 36. We use the Adam W (Loshchilov & Hutter, 2019) optimizer with the cosine learning rate strategy for all datasets. By default, we set β = 0.5, use a learning rate of 5 10 6, and train for 5000 iterations. |