SEED: Self-supervised Distillation For Visual Representation
Authors: Zhiyuan Fang, Jianfeng Wang, Lijuan Wang, Lei Zhang, Yezhou Yang, Zicheng Liu
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENT -- Compared with self-supervised baselines, SEED improves the top-1 accuracy from 42.2% to 67.6% on Efficient Net-B0 and from 36.3% to 68.2% on Mobile Net V3-Large on the Image Net-1k dataset. -- Table 1: Image Net-1k test accuracy (%) using KNN and linear classification for multiple students and Mo Cov2 pre-trained deeper teacher architectures. |
| Researcher Affiliation | Collaboration | Arizona State University, Microsoft Corporation |
| Pseudocode | Yes | We provide pseudo-code of the SEED distillation in Py Torch Paszke et al. (2019) style: 1 Q: maintaining queue of previous representations: (N X D) |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that the source code for their methodology is publicly available. |
| Open Datasets | Yes | On the Image Net-1k dataset, SEED improves the linear probe accuracy of Efficient Net-B0 from 42.2% to 67.6% (a gain over 25%), and Mobile Net-V3 from 36.3% to 68.2% (a gain over 31%) compared to Mo Co-V2 baselines, as shown in Figure 1 and Section 4. -- We conduct the supervised linear classification on Image Net-1K, which contains 1.3M images for training, and 50,000 images for validation, spanning 1,000 categories. -- The full image ids for semi-supervised evaluation on Image Net-1k can be found at https://github. com/google-research/simclr/tree/master/imagenet_subsets. |
| Dataset Splits | Yes | We conduct the supervised linear classification on Image Net-1K, which contains 1.3M images for training, and 50,000 images for validation, spanning 1,000 categories. -- Following (Oord et al., 2018; Kornblith et al., 2019; Kolesnikov et al., 2019), we evaluate the representation on the semi-supervised task, where a fixed 1% or 10% subsets of Image Net training data (Chen et al., 2020a) are provided with the annotations. -- For the validation set, we randomly pick 10 images (yielding 20% of the dataset) |
| Hardware Specification | No | The paper mentions general hardware terms like 'GPU' ('8 images per GPU') but does not provide specific details on the CPU, GPU models, memory, or cloud instance types used for the experiments. |
| Software Dependencies | Yes | We provide pseudo-code of the SEED distillation in Py Torch Paszke et al. (2019) style: -- We use Detectron2 (Wu et al., 2019) for the implementations. |
| Experiment Setup | Yes | Our distillation is trained with a standard SGD optimizer with momentum 0.9 and a weight decay parameter of 1e-4 for 200 epochs. The initial learning rate is set as 0.03 and updated by a cosine decay scheduler (Nair & Hinton, 2010) with 5 warm-up epochs and batch size 256. In Eq. 4, the teacher temperature is set as τ T = 0.01 and the student temperature is τ S = 0.2. The queue size of K is 65,536. |