Towards Fully Sparse Training: Information Restoration with Spatial Similarity

Authors: Weixiang Xu, Xiangyu He, Ke Cheng, Peisong Wang, Jian Cheng2929-2937

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluation of accuracy and efficiency shows that we can achieve 2 training acceleration with negligible accuracy degradation on challenging large-scale classification and detection tasks.In this section, we evaluate the proposed FST in terms of accuracy and efficiency. Our experiments are conducted on image classification and object detection.
Researcher Affiliation Academia Weixiang Xu1,2, Xiangyu He1,2, Ke Cheng1,2, Peisong Wang1, Jian Cheng1 1NLPR, Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences {xuweixiang2018,chengke2017}@ia.ac.cn, {xiangyu.he, peisong.wang, jcheng}@nlpr.ia.ac.cn
Pseudocode No The paper describes its methods in prose but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code for the methodology or provide a link to a code repository.
Open Datasets Yes To verify the effectiveness of our method, we first evaluate it on the large-scale Image Net.The PASCAL VOC dataset contains around 16k training images with 20 different classes, while the COCO dataset consists of about 80k training images from 80 different categories.
Dataset Splits No The paper mentions training details like 'batch size 256 for 120 epochs' but does not explicitly state the dataset splits (e.g., percentage for training, validation, and test sets).
Hardware Specification Yes The execution environment is as below: Tesla A100 GPU 1, Py Torch 1.7, CUDA 11.1.
Software Dependencies Yes The execution environment is as below: Tesla A100 GPU 1, Py Torch 1.7, CUDA 11.1.
Experiment Setup Yes We follow hyperparameter settings as (Zhou et al. 2021): all models are trained with batch size 256 for 120 epochs, and learning rate is annealed from 0.1 to 0 with a cosine scheduler. In order to reproduce their reported accuracy, we set weight decay as 7e-5 and use label smooth.