Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Self-supervised Models are Good Teaching Assistants for Vision Transformers
Authors: Haiyan Wu, Yuting Gao, Yinqi Zhang, Shaohui Lin, Yuan Xie, Xing Sun, Ke Li
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments verify the effectiveness of SSTA and demonstrate that the proposed SSTA is a good compensation to the supervised teacher. [...] Extensive experiments are conducted to demonstrate the advantage of the self-supervised teaching assistant. |
| Researcher Affiliation | Collaboration | 1School of Computer Science and Technology, East China Normal University, Shanghai, China 2Tencent Youtu Lab, Shanghai, China. |
| Pseudocode | No | The paper describes the methodology using text and mathematical equations for attention computation (Eq 1 and 2), but does not provide any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is released in https://github.com/Glassy Wu/SSTA |
| Open Datasets | Yes | Image Net (Russakovsky et al., 2015) is used to verify the effectiveness of our method. CIFAR-10 and CIFAR-100 (Krizhevsky et al., 2009) are adopted for downstream transfering tasks. Image Net-C (Hendrycks & Dietterich, 2019) is utilized to analyze the robustness of the representations. SIN dataset (Geirhos et al., 2018) is used to evaluate the shape bias of models. |
| Dataset Splits | Yes | The abscissa is the top 10 categories in the validation dataset of Image Net predicted by SL teacher and SSL teacher, and the ordinate is the specific number. [...] The total number of distillation epochs are 300 and 400 for Dei T and XCi T respectively, and the corresponding early stop epochs are 100 and 150. |
| Hardware Specification | No | The paper describes various experimental settings, including datasets and training parameters, but does not specify the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the implementation, such as Python, PyTorch, or CUDA versions. |
| Experiment Setup | Yes | Following Dei T and XCi T, the total number of distillation epochs are 300 and 400 for Dei T and XCi T respectively, and the corresponding early stop epochs are 100 and 150. [...] The total loss is defined as follows: LT otal = α LCE(f S(X), y) + β LSL KD + λ LSSL KD , (7) where LCE( ) denotes Cross Entropy, and y is ground truth. α, β and λ are the hyper-parameters that control the weights of CE loss and distillation loss. |