Bag of Instances Aggregation Boosts Self-supervised Distillation

Authors: Haohang Xu, Jiemin Fang, XIAOPENG ZHANG, Lingxi Xie, Xinggang Wang, Wenrui Dai, Hongkai Xiong, Qi Tian

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate the feature representations of the distilled student networks on several widely used benchmarks. We first report the performance on Image Net under the linear evaluation and semi-supervised protocols. Then we conduct evaluation on several downstream tasks including object detection and instance segmentation, as well as some ablation studies to diagnose how each component and parameter affect the performance.
Researcher Affiliation Collaboration 1Shanghai Jiao Tong University 2Huawei Inc. 3Institute of Artificial Intelligence, Huazhong University of Science & Technology 4School of EIC, Huazhong University of Science & Technology
Pseudocode No The paper describes the method using text and mathematical equations but does not include any formal pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/haohang96/bingo.
Open Datasets Yes SGD optimizer with momentum 0.9 and weight decay 0.0001 for 200 epochs on Image Net (Deng et al., 2009). [...] Transfer to CIFAR-10/CIFAR-100 classification Following the evaluation protocol in Fang et al. (2020); Gao et al. (2021), we assess the generalization of BINGO on the CIFAR-10/CIFAR-100 dataset.
Dataset Splits Yes Table 1: KNN classification accuracy on Image Net. We report the results on the validation set with 10 nearest neighbors. [...] Linear Evaluation In order to evaluate the performance of BINGO, we train a linear classifier upon the frozen representation, following the common evaluation protocol in Chen et al. (2020c). For fair comparisons, we use the same hyper-parameters as Fang et al. (2020); Gao et al. (2021) during linear evaluation stage.
Hardware Specification No The paper mentions '8 GPUs' but does not specify the exact GPU models, CPU models, or any other specific hardware components used for experiments.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup Yes trained with the SGD optimizer with momentum 0.9 and weight decay 0.0001 for 200 epochs on Image Net (Deng et al., 2009). The batch size and learning rate are set as 256 and 0.03 for 8 GPUs, which simply follow the hyper-parameter settings as in Chen et al. (2020c). The learning rate is decayed to 0 by a cosine scheduler. The Cut Mix used in Gidaris et al. (2021) and Xu et al. (2020b) is also applied to boost the performance. The temperature τ and the size of memory bank are set as 0.2 and 65,536 respectively.