S$^3$: Sign-Sparse-Shift Reparametrization for Effective Training of Low-bit Shift Networks

Authors: Xinlin Li, Bang Liu, Yaoliang Yu, Wulong Liu, Chunjing XU, Vahid Partovi Nia

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on the Image Net dataset. While all previous methods require at least 5-bit weight representation to achieve the same performance as the full-precision neural networks on large datasets such as Image Net, our experimental results show that our proposed method surpasses all previous methods, pushes this boundary further to 3-bits.
Researcher Affiliation Collaboration Xinlin Li1, Bang Liu2, Yaoliang Yu3, Wulong Liu1, Chunjing Xu1, and Vahid Partovi Nia1 1Noah s Ark Lab, Huawei Technologies. 2Department of Computer Science and Operations Research (DIRO), University of Montreal. 3Cheriton School of Computer Science, University of Waterloo.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. Figure 3 illustrates a process but is not a formal pseudocode block.
Open Source Code No Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] We will provide the code upon publication.
Open Datasets Yes We evaluate our proposed method on ILSVRC2012 [7] dataset with different bit-widths to demonstrate the effectiveness and robustness of our method. We use Res Net-18 and Res Net-50 as our backbone with the same data augmentation and pre-processing strategy proposed in [13].
Dataset Splits Yes We evaluate our proposed method on ILSVRC2012 [7] dataset... We use Res Net-18 and Res Net-50 as our backbone with the same data augmentation and pre-processing strategy proposed in [13]. All models are converged and reach a reasonable validation accuracy on CIFAR10 (> 91%).
Hardware Specification No Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No] , but we gave standard training details in section 5.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes We train the networks with 200 epochs utilizing the cosine learning rate, and the initial learning rate is 1e-3. The networks are optimized with SGD optimizer, and the momentum and weight decay are set to 0.9 and 1e-4 respectively. The hyper-parameter α of dense weight regularizer is set to 1e-5 without using decay scheduler.