Searching for Low-Bit Weights in Quantized Neural Networks

Authors: Zhaohui Yang, Yunhe Wang, Kai Han, Chunjing XU, Chao Xu, Dacheng Tao, Chang Xu

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on benchmarks demonstrate that the proposed method is able to produce quantized neural networks with higher performance over the state-of-the-art methods on both image classification and super-resolution tasks. The Py Torch code will be made available at https://github.com/ huawei-noah/Binary-Neural-Networks/tree/main/SLB and the Mind Spore code will be made available at https://www.mindspore.cn/ resources/hub.
Researcher Affiliation Collaboration 1 Key Lab of Machine Perception (MOE), Dept. of Machine Intelligence, Peking University. 2 Noah s Ark Lab, Huawei Technologies. 3 School of Computer Science, Faculty of Engineering, University of Sydney.
Pseudocode Yes Algorithm 1 Training algorithm of SLB
Open Source Code No The Py Torch code will be made available at https://github.com/ huawei-noah/Binary-Neural-Networks/tree/main/SLB and the Mind Spore code will be made available at https://www.mindspore.cn/ resources/hub.
Open Datasets Yes Following common practice in most works, we use the CIFAR-10 [37] and large scale ILSVRC2012 [9] recognition datasets to demonstrate the effectiveness of our method. we use the 291 images as in [55] for training and test on Set5 dataset [3].
Dataset Splits No The paper mentions using CIFAR-10, ILSVRC2012, and Set5 datasets, which have standard splits, but it does not explicitly state the train/validation/test split percentages or sample counts used for its experiments.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions PyTorch and MindSpore but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes We train the network for 500 epochs in total and decay the learning rate by a factor of 10 at 350, 440, and 475 epochs. The learning rate starts from 1e-3, weight decay is set to 0, and Adam optimizer is used to update parameters. We set Ts = 0.01 and Te = 10. For sin and linear schedulers, the accuracies converge rapidly.