reproducibilityindex.ai

Bag of Instances Aggregation Boosts Self-supervised Distillation

Authors: Haohang Xu, Jiemin Fang, XIAOPENG ZHANG, Lingxi Xie, Xinggang Wang, Wenrui Dai, Hongkai Xiong, Qi Tian

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate the feature representations of the distilled student networks on several widely used benchmarks. We ﬁrst report the performance on Image Net under the linear evaluation and semi-supervised protocols. Then we conduct evaluation on several downstream tasks including object detection and instance segmentation, as well as some ablation studies to diagnose how each component and parameter affect the performance.
Researcher Affiliation	Collaboration	1Shanghai Jiao Tong University 2Huawei Inc. 3Institute of Artiﬁcial Intelligence, Huazhong University of Science & Technology 4School of EIC, Huazhong University of Science & Technology
Pseudocode	No	The paper describes the method using text and mathematical equations but does not include any formal pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at https://github.com/haohang96/bingo.
Open Datasets	Yes	SGD optimizer with momentum 0.9 and weight decay 0.0001 for 200 epochs on Image Net (Deng et al., 2009). [...] Transfer to CIFAR-10/CIFAR-100 classiﬁcation Following the evaluation protocol in Fang et al. (2020); Gao et al. (2021), we assess the generalization of BINGO on the CIFAR-10/CIFAR-100 dataset.
Dataset Splits	Yes	Table 1: KNN classiﬁcation accuracy on Image Net. We report the results on the validation set with 10 nearest neighbors. [...] Linear Evaluation In order to evaluate the performance of BINGO, we train a linear classiﬁer upon the frozen representation, following the common evaluation protocol in Chen et al. (2020c). For fair comparisons, we use the same hyper-parameters as Fang et al. (2020); Gao et al. (2021) during linear evaluation stage.
Hardware Specification	No	The paper mentions '8 GPUs' but does not specify the exact GPU models, CPU models, or any other specific hardware components used for experiments.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup	Yes	trained with the SGD optimizer with momentum 0.9 and weight decay 0.0001 for 200 epochs on Image Net (Deng et al., 2009). The batch size and learning rate are set as 256 and 0.03 for 8 GPUs, which simply follow the hyper-parameter settings as in Chen et al. (2020c). The learning rate is decayed to 0 by a cosine scheduler. The Cut Mix used in Gidaris et al. (2021) and Xu et al. (2020b) is also applied to boost the performance. The temperature τ and the size of memory bank are set as 0.2 and 65,536 respectively.