Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models

Authors: Andy Zhou, Jindong Wang, Yu-Xiong Wang, Haohan Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experimental Results
Researcher Affiliation Collaboration 1University of Illinois at Urbana-Champaign 2Microsoft Research 3AI@UIUC {andyz3, yxw, haohanw}@illinois.edu, jindong.wang@microsoft.com
Pseudocode No The paper presents mathematical equations and describes procedures in prose, but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes 1code at https://github.com/andyz245/Discrete Adversarial Distillation
Open Datasets Yes Datasets. We train our models on Image Net-1K [13].
Dataset Splits No The paper mentions training on ImageNet-1K and evaluating on various ImageNet variants (ImageNet-V2, ImageNet A, ImageNet-Sketch, ImageNet-Rendition, ImageNet-C, Stylized-ImageNet), but it does not specify explicit train/validation/test splits with percentages, sample counts, or explicit references to standard validation splits.
Hardware Specification Yes We conduct all of our experiments on 8 32GB NVIDIA V100 GPUs.
Software Dependencies No The paper mentions using VQGAN, Aug Reg configurations, and references to hyperparameter settings from other papers, but does not specify software versions (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes For knowledge distillation, we use a temperature of t = 4 for all models and a = 0.5, following [63]. For DAD, we also weight the second KL-divergence term by a. All Vi T models are trained with the Aug Reg [59] hyperparameter and data augmentation configurations. We use one iteration for the adversarial attack, and an attack learning rate of 0.1.