Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models
Authors: Andy Zhou, Jindong Wang, Yu-Xiong Wang, Haohan Wang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experimental Results |
| Researcher Affiliation | Collaboration | 1University of Illinois at Urbana-Champaign 2Microsoft Research 3AI@UIUC {andyz3, yxw, haohanw}@illinois.edu, jindong.wang@microsoft.com |
| Pseudocode | No | The paper presents mathematical equations and describes procedures in prose, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1code at https://github.com/andyz245/Discrete Adversarial Distillation |
| Open Datasets | Yes | Datasets. We train our models on Image Net-1K [13]. |
| Dataset Splits | No | The paper mentions training on ImageNet-1K and evaluating on various ImageNet variants (ImageNet-V2, ImageNet A, ImageNet-Sketch, ImageNet-Rendition, ImageNet-C, Stylized-ImageNet), but it does not specify explicit train/validation/test splits with percentages, sample counts, or explicit references to standard validation splits. |
| Hardware Specification | Yes | We conduct all of our experiments on 8 32GB NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper mentions using VQGAN, Aug Reg configurations, and references to hyperparameter settings from other papers, but does not specify software versions (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | For knowledge distillation, we use a temperature of t = 4 for all models and a = 0.5, following [63]. For DAD, we also weight the second KL-divergence term by a. All Vi T models are trained with the Aug Reg [59] hyperparameter and data augmentation configurations. We use one iteration for the adversarial attack, and an attack learning rate of 0.1. |