Spatial Ensemble: a Novel Model Smoothing Mechanism for Student-Teacher Framework
Authors: Tengteng Huang, Yifan Sun, Xun Wang, Haotian Yao, Chi Zhang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the effectiveness of the proposed Spatial-Temporal Smoothing by applying it to the stateof-the-art self-supervised approaches (Mo Co [1] and BYOL [2]) and semi-supervised method (Fix Match [7]). We use the official implementation of Mo Co2 and re-implement BYOL and Fix Match using Pytorch [30]. All experiments are conducted on a machine with 8 RTX2080 GPUs. |
| Researcher Affiliation | Collaboration | Megvii Technology {huangtengteng, yaohaotian, zhangchi}@megvii.com, sunyf15@tsinghua.org.cn, bnuwangxun@gmail.com |
| Pseudocode | Yes | Algorithm 1 Pseudocode of SE. Algorithm 2 Pseudo code of STS. |
| Open Source Code | Yes | Codes and models are available at: https://github.com/tengteng95/Spatial_ Ensemble. |
| Open Datasets | Yes | All the ablation experiments are conducted on the Image Net dataset [36] and trained for 200 epochs unless noted otherwise. ... We use CIFAR-10 and Ci FAR-100 as the benchmark datasets. |
| Dataset Splits | No | The paper mentions common benchmark datasets like ImageNet and CIFAR-10/100 and states it follows the training and evaluation settings of original papers, but does not explicitly provide specific train/validation/test dataset split percentages or sample counts within its own text. |
| Hardware Specification | Yes | All experiments are conducted on a machine with 8 RTX2080 GPUs. |
| Software Dependencies | No | The paper mentions using 'Pytorch' but does not specify its version number or any other software dependencies with their respective version numbers. |
| Experiment Setup | Yes | The masking probability p is set to 0.7/0.5/0.5 for BYOL/Mo Co/Fix Match, respectively. ... The initial learning rate is set to 0.03 and adjusted by cosine learning rate scheduler [32]. Following the original paper, we train the model using SGD with momentum of 0.9, weight decay of 0.0001, and a mini-batchsize of 256. |