Towards Fully Sparse Training: Information Restoration with Spatial Similarity
Authors: Weixiang Xu, Xiangyu He, Ke Cheng, Peisong Wang, Jian Cheng2929-2937
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluation of accuracy and efficiency shows that we can achieve 2 training acceleration with negligible accuracy degradation on challenging large-scale classification and detection tasks.In this section, we evaluate the proposed FST in terms of accuracy and efficiency. Our experiments are conducted on image classification and object detection. |
| Researcher Affiliation | Academia | Weixiang Xu1,2, Xiangyu He1,2, Ke Cheng1,2, Peisong Wang1, Jian Cheng1 1NLPR, Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences {xuweixiang2018,chengke2017}@ia.ac.cn, {xiangyu.he, peisong.wang, jcheng}@nlpr.ia.ac.cn |
| Pseudocode | No | The paper describes its methods in prose but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the methodology or provide a link to a code repository. |
| Open Datasets | Yes | To verify the effectiveness of our method, we first evaluate it on the large-scale Image Net.The PASCAL VOC dataset contains around 16k training images with 20 different classes, while the COCO dataset consists of about 80k training images from 80 different categories. |
| Dataset Splits | No | The paper mentions training details like 'batch size 256 for 120 epochs' but does not explicitly state the dataset splits (e.g., percentage for training, validation, and test sets). |
| Hardware Specification | Yes | The execution environment is as below: Tesla A100 GPU 1, Py Torch 1.7, CUDA 11.1. |
| Software Dependencies | Yes | The execution environment is as below: Tesla A100 GPU 1, Py Torch 1.7, CUDA 11.1. |
| Experiment Setup | Yes | We follow hyperparameter settings as (Zhou et al. 2021): all models are trained with batch size 256 for 120 epochs, and learning rate is annealed from 0.1 to 0 with a cosine scheduler. In order to reproduce their reported accuracy, we set weight decay as 7e-5 and use label smooth. |