Spatial Ensemble: a Novel Model Smoothing Mechanism for Student-Teacher Framework

Authors: Tengteng Huang, Yifan Sun, Xun Wang, Haotian Yao, Chi Zhang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the effectiveness of the proposed Spatial-Temporal Smoothing by applying it to the stateof-the-art self-supervised approaches (Mo Co [1] and BYOL [2]) and semi-supervised method (Fix Match [7]). We use the official implementation of Mo Co2 and re-implement BYOL and Fix Match using Pytorch [30]. All experiments are conducted on a machine with 8 RTX2080 GPUs.
Researcher Affiliation Collaboration Megvii Technology {huangtengteng, yaohaotian, zhangchi}@megvii.com, sunyf15@tsinghua.org.cn, bnuwangxun@gmail.com
Pseudocode Yes Algorithm 1 Pseudocode of SE. Algorithm 2 Pseudo code of STS.
Open Source Code Yes Codes and models are available at: https://github.com/tengteng95/Spatial_ Ensemble.
Open Datasets Yes All the ablation experiments are conducted on the Image Net dataset [36] and trained for 200 epochs unless noted otherwise. ... We use CIFAR-10 and Ci FAR-100 as the benchmark datasets.
Dataset Splits No The paper mentions common benchmark datasets like ImageNet and CIFAR-10/100 and states it follows the training and evaluation settings of original papers, but does not explicitly provide specific train/validation/test dataset split percentages or sample counts within its own text.
Hardware Specification Yes All experiments are conducted on a machine with 8 RTX2080 GPUs.
Software Dependencies No The paper mentions using 'Pytorch' but does not specify its version number or any other software dependencies with their respective version numbers.
Experiment Setup Yes The masking probability p is set to 0.7/0.5/0.5 for BYOL/Mo Co/Fix Match, respectively. ... The initial learning rate is set to 0.03 and adjusted by cosine learning rate scheduler [32]. Following the original paper, we train the model using SGD with momentum of 0.9, weight decay of 0.0001, and a mini-batchsize of 256.