Test-Time Distribution Normalization for Contrastively Learned Visual-language Models
Authors: Yifei Zhou, Juntao Ren, Fengyu Li, Ramin Zabih, Ser Nam Lim
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on a wide variety of downstream tasks exhibit a clear advantage of DN over the dot product on top of other existing test-time augmentation methods. Our experiments are designed to answer the following questions: 1) whether our proposed DN can uniformly improve a wide range of cross-modal alignment tasks for different kinds of cross-modal representation models and whether this gain is larger than that achieved by other zeroshot CLIP augmentations, 2) whether DN can be used in parallel with other common test-time adaptation methods compatible with CLIP, 3) how robust DN is when only scarce, unlabeled data is available to estimate the mean from, and 4) whether DN can improve the performance of fine-tuned models in addition to pre-trained models. |
| Researcher Affiliation | Academia | Yifei Zhou University of California, Berkeley yifei zhou@berkeley.edu Juntao Ren Cornell University jlr429@cornell.edu Fengyu Li Cornell University fl334@cornell.edu Ramin Zabih Cornell University rdz@cs.cornell.edu Ser-Nam Lim University of Central Florida sernam@ucf.edu |
| Pseudocode | No | The paper describes its methodology using mathematical equations and textual explanations but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/fengyuli2002/distribution-normalization. |
| Open Datasets | Yes | The paper uses several well-known and publicly available datasets, including 'COCO [39]', 'Flickr30K [48]', 'Image Net1K [9]', 'Cifar100 [28]', 'SUN397 [67]', 'Stanford Cars [27]', 'Caltech 101 [31]', 'Flowers102 [44]', 'Flickr8k-expert [18]', 'Flickr8k-cf [18]', 'THumb [24]', and 'Pascal-50S [51]', all of which are properly cited. |
| Dataset Splits | Yes | For all the retrieval tasks, we estimated the mean with 100 random unlabeled samples from the validation set and calculated standard deviations and average recalls with 5 random seeds. We took a train-test split following [68] and [33], which involves a selection of 30K images for fine-tuning and 1K images for testing. |
| Hardware Specification | Yes | We fine-tune CLIP on the MSCOCO training set, for a total of 10 epochs on 4 Nvidia 2080Ti. |
| Software Dependencies | No | The paper mentions using the 'Adam optimizer' and specific learning rates/weight decays, but does not provide specific version numbers for any software libraries (e.g., Python, PyTorch, CUDA, scikit-learn) used in the experiments. |
| Experiment Setup | Yes | We use the Adam optimizer with a learning rate of 1e-5 and a weight decay of 0.1. We again use the Adam optimizer with a learning rate of 1e-5, but with a weight decay of 0.02. We fine-tune CLIP on the MSCOCO training set, for a total of 10 epochs on 4 Nvidia 2080Ti. |