Hybrid Distillation: Connecting Masked Autoencoders with Contrastive Learners
Authors: Bowen Shi, XIAOPENG ZHANG, Yaoming Wang, Jin Li, Wenrui Dai, Junni Zou, Hongkai Xiong, Qi Tian
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that Hybrid Distill achieves superior performance on various benchmark datasets. |
| Researcher Affiliation | Collaboration | Bowen Shi1 Xiaopeng Zhang2 Yaoming Wang1 Jin Li1 Wenrui Dai1 Junni Zou1 Hongkai Xiong1 Qi Tian2 1Shanghai Jiao Tong University 2Huawei Inc. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/lygsbw/hybriddistill. |
| Open Datasets | Yes | The input size is 2242. For Vi T-B, the distillation is based on Image Net-1K Russakovsky et al. (2015), and the epoch is 300 for main results and 100 for ablation studies. For Vi T-L, we conduct 300 epoch distillation based on Image Net-1K and 40 epoch distillation based on Image Net-21K, respectively. The performances are tested on different downstream tasks, including Image Net-1K, CIFAR100 (Krizhevsky et al., 2009), Cars (Krause et al., 2013), and i Naturalist19 (Van Horn et al., 2018) classification, COCO (Lin et al., 2014) object detection and instance segmentation, and ADE20K (Zhou et al., 2019) segmentation. |
| Dataset Splits | No | The paper mentions using standard datasets like Image Net-1K, COCO, and ADE20K for training and evaluation, but does not explicitly provide the specific training/validation/test dataset splits used for reproduction, nor does it refer to predefined splits with specific citations regarding the splits themselves. |
| Hardware Specification | Yes | Our experiments are conducted on 8 V100 GPUs. |
| Software Dependencies | No | The paper mentions the use of Adam W optimizer, ViT, and Mask-RCNN frameworks but does not provide specific version numbers for programming languages or libraries like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | The batch size, learning rate, and weight decay are set to 1024, 6e-4, and 0.05, respectively. Adam W (Loshchilov & Hutter, 2017) optimizer and cosine decay (Loshchilov & Hutter, 2016) schedule is used. The input size is 2242. For Vi T-B, the distillation is based on Image Net-1K Russakovsky et al. (2015), and the epoch is 300 for main results and 100 for ablation studies. For Vi T-L, we conduct 300 epoch distillation based on Image Net-1K and 40 epoch distillation based on Image Net-21K, respectively. The hyperparameter α and β are set to 1.0 and the redundant token masking set I is set to [0, L/3, 2L/3] following Li et al. (2023). |