Bag of Instances Aggregation Boosts Self-supervised Distillation
Authors: Haohang Xu, Jiemin Fang, XIAOPENG ZHANG, Lingxi Xie, Xinggang Wang, Wenrui Dai, Hongkai Xiong, Qi Tian
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate the feature representations of the distilled student networks on several widely used benchmarks. We first report the performance on Image Net under the linear evaluation and semi-supervised protocols. Then we conduct evaluation on several downstream tasks including object detection and instance segmentation, as well as some ablation studies to diagnose how each component and parameter affect the performance. |
| Researcher Affiliation | Collaboration | 1Shanghai Jiao Tong University 2Huawei Inc. 3Institute of Artificial Intelligence, Huazhong University of Science & Technology 4School of EIC, Huazhong University of Science & Technology |
| Pseudocode | No | The paper describes the method using text and mathematical equations but does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/haohang96/bingo. |
| Open Datasets | Yes | SGD optimizer with momentum 0.9 and weight decay 0.0001 for 200 epochs on Image Net (Deng et al., 2009). [...] Transfer to CIFAR-10/CIFAR-100 classification Following the evaluation protocol in Fang et al. (2020); Gao et al. (2021), we assess the generalization of BINGO on the CIFAR-10/CIFAR-100 dataset. |
| Dataset Splits | Yes | Table 1: KNN classification accuracy on Image Net. We report the results on the validation set with 10 nearest neighbors. [...] Linear Evaluation In order to evaluate the performance of BINGO, we train a linear classifier upon the frozen representation, following the common evaluation protocol in Chen et al. (2020c). For fair comparisons, we use the same hyper-parameters as Fang et al. (2020); Gao et al. (2021) during linear evaluation stage. |
| Hardware Specification | No | The paper mentions '8 GPUs' but does not specify the exact GPU models, CPU models, or any other specific hardware components used for experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | trained with the SGD optimizer with momentum 0.9 and weight decay 0.0001 for 200 epochs on Image Net (Deng et al., 2009). The batch size and learning rate are set as 256 and 0.03 for 8 GPUs, which simply follow the hyper-parameter settings as in Chen et al. (2020c). The learning rate is decayed to 0 by a cosine scheduler. The Cut Mix used in Gidaris et al. (2021) and Xu et al. (2020b) is also applied to boost the performance. The temperature τ and the size of memory bank are set as 0.2 and 65,536 respectively. |