Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models
Authors: Kaican Li, Weiyan XIE, Yongxiang Huang, Didan Deng, Lanqing Hong, Zhenguo Li, Ricardo Silva, Nevin L. Zhang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate DRM on multiple real-world benchmarks and conduct ablation studies to assess the impacts of various design choices. We conduct our experiments on three varying sizes of pre-trained CLIP models: Vi T-B/16, Vi T-L/14 and Vi T-L/14@336 (Radford et al., 2021). Finally, we analyze the reliability of LLM-generated concept descriptions and the impact of λ on performance. |
| Researcher Affiliation | Collaboration | Kaican Li1 Weiyan Xie1 Yongxiang Huang2 Didan Deng2 Lanqing Hong2 Zhenguo Li1,2 Ricardo Silva3 Nevin L. Zhang1 1The Hong Kong University of Science and Technology 2Huawei 3University College London |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. It provides a mathematical theorem and its proof in Appendix A, but this is not pseudocode. |
| Open Source Code | Yes | Our code is available at https://github.com/vaynexie/DRM. |
| Open Datasets | Yes | Datasets. IMAGENET (Deng et al., 2009) comprises over a million natural images across 1,000 classes... WILDS-IWILDCAM (IWILDCAM) (Koh et al., 2021) contains camera-trap images for wildlife classification... WILDS-FMOW (FMOW) (Koh et al., 2021) is a dataset of satellite images... DOLLAR STREET-DA and GEOYFCC-DA (Prabhu et al., 2022) are datasets for testing model generalization... The datasets employed in our experiments are described in Section 5.1 and are all publicly accessible. |
| Dataset Splits | Yes | We use the training set for fine-tuning and the validation set for assessing ID accuracy... We choose all hyperparameters of DRM and baseline methods based on the performance on the ID validation set, i.e., training-domain validation (Gulrajani and Lopez-Paz, 2021)... All performance statistics... are averaged over 5 runs with different random seeds. The 95% confidence intervals over the 5 runs are reported. |
| Hardware Specification | Yes | All experiments were conducted on a high-performance computing cluster equipped with NVIDIA DGX H800 nodes. Two H800 GPUs with 80 GB VRAM were utilized for all trainings involving CLIP Vi T-B/16 and CLIP Vi T-L/14, while four H800 GPUs were employed for the training of CLIP Vi T-L/14@336. |
| Software Dependencies | No | The paper mentions optimizers like 'Adam W' and the use of 'GPT-4-turbo API' but does not provide specific version numbers for software dependencies such as programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch), or other libraries necessary for replication. |
| Experiment Setup | Yes | F.1 Hyperparameter settings: ...for i Wild Cam, the settings were as follows: training epochs=20, learning rate=1e-5, batch-size=256, and optimizer=Adam W with weight decay=0.2; For FMo W, the settings were as follows: training epochs=20, learning rate=1e-5, batch-size=256, and optimizer=Adam W with weight decay=0.2; For Image Net, the settings were as follows: training epochs=10, learning rate=1e-5, batch-size=256, and optimizer=Adam W with weight decay=0.1. The value of λ used in our DRM training was picked from {1, 2, 3, 4, 5} based on the performance on the ID validation set. |