reproducibilityindex.ai

Self-supervised Models are Good Teaching Assistants for Vision Transformers

Authors: Haiyan Wu, Yuting Gao, Yinqi Zhang, Shaohui Lin, Yuan Xie, Xing Sun, Ke Li

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments verify the effectiveness of SSTA and demonstrate that the proposed SSTA is a good compensation to the supervised teacher. [...] Extensive experiments are conducted to demonstrate the advantage of the self-supervised teaching assistant.
Researcher Affiliation	Collaboration	1School of Computer Science and Technology, East China Normal University, Shanghai, China 2Tencent Youtu Lab, Shanghai, China.
Pseudocode	No	The paper describes the methodology using text and mathematical equations for attention computation (Eq 1 and 2), but does not provide any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code is released in https://github.com/Glassy Wu/SSTA
Open Datasets	Yes	Image Net (Russakovsky et al., 2015) is used to verify the effectiveness of our method. CIFAR-10 and CIFAR-100 (Krizhevsky et al., 2009) are adopted for downstream transfering tasks. Image Net-C (Hendrycks & Dietterich, 2019) is utilized to analyze the robustness of the representations. SIN dataset (Geirhos et al., 2018) is used to evaluate the shape bias of models.
Dataset Splits	Yes	The abscissa is the top 10 categories in the validation dataset of Image Net predicted by SL teacher and SSL teacher, and the ordinate is the specific number. [...] The total number of distillation epochs are 300 and 400 for Dei T and XCi T respectively, and the corresponding early stop epochs are 100 and 150.
Hardware Specification	No	The paper describes various experimental settings, including datasets and training parameters, but does not specify the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies or libraries used in the implementation, such as Python, PyTorch, or CUDA versions.
Experiment Setup	Yes	Following Dei T and XCi T, the total number of distillation epochs are 300 and 400 for Dei T and XCi T respectively, and the corresponding early stop epochs are 100 and 150. [...] The total loss is defined as follows: LT otal = α LCE(f S(X), y) + β LSL KD + λ LSSL KD , (7) where LCE( ) denotes Cross Entropy, and y is ground truth. α, β and λ are the hyper-parameters that control the weights of CE loss and distillation loss.