Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models

Authors: Kaican Li, Weiyan XIE, Yongxiang Huang, Didan Deng, Lanqing Hong, Zhenguo Li, Ricardo Silva, Nevin L. Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate DRM on multiple real-world benchmarks and conduct ablation studies to assess the impacts of various design choices. We conduct our experiments on three varying sizes of pre-trained CLIP models: Vi T-B/16, Vi T-L/14 and Vi T-L/14@336 (Radford et al., 2021). Finally, we analyze the reliability of LLM-generated concept descriptions and the impact of λ on performance.
Researcher Affiliation Collaboration Kaican Li1 Weiyan Xie1 Yongxiang Huang2 Didan Deng2 Lanqing Hong2 Zhenguo Li1,2 Ricardo Silva3 Nevin L. Zhang1 1The Hong Kong University of Science and Technology 2Huawei 3University College London
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It provides a mathematical theorem and its proof in Appendix A, but this is not pseudocode.
Open Source Code Yes Our code is available at https://github.com/vaynexie/DRM.
Open Datasets Yes Datasets. IMAGENET (Deng et al., 2009) comprises over a million natural images across 1,000 classes... WILDS-IWILDCAM (IWILDCAM) (Koh et al., 2021) contains camera-trap images for wildlife classification... WILDS-FMOW (FMOW) (Koh et al., 2021) is a dataset of satellite images... DOLLAR STREET-DA and GEOYFCC-DA (Prabhu et al., 2022) are datasets for testing model generalization... The datasets employed in our experiments are described in Section 5.1 and are all publicly accessible.
Dataset Splits Yes We use the training set for fine-tuning and the validation set for assessing ID accuracy... We choose all hyperparameters of DRM and baseline methods based on the performance on the ID validation set, i.e., training-domain validation (Gulrajani and Lopez-Paz, 2021)... All performance statistics... are averaged over 5 runs with different random seeds. The 95% confidence intervals over the 5 runs are reported.
Hardware Specification Yes All experiments were conducted on a high-performance computing cluster equipped with NVIDIA DGX H800 nodes. Two H800 GPUs with 80 GB VRAM were utilized for all trainings involving CLIP Vi T-B/16 and CLIP Vi T-L/14, while four H800 GPUs were employed for the training of CLIP Vi T-L/14@336.
Software Dependencies No The paper mentions optimizers like 'Adam W' and the use of 'GPT-4-turbo API' but does not provide specific version numbers for software dependencies such as programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch), or other libraries necessary for replication.
Experiment Setup Yes F.1 Hyperparameter settings: ...for i Wild Cam, the settings were as follows: training epochs=20, learning rate=1e-5, batch-size=256, and optimizer=Adam W with weight decay=0.2; For FMo W, the settings were as follows: training epochs=20, learning rate=1e-5, batch-size=256, and optimizer=Adam W with weight decay=0.2; For Image Net, the settings were as follows: training epochs=10, learning rate=1e-5, batch-size=256, and optimizer=Adam W with weight decay=0.1. The value of λ used in our DRM training was picked from {1, 2, 3, 4, 5} based on the performance on the ID validation set.