Entropy-Driven Mixed-Precision Quantization for Deep Network Design

Authors: Zhenhong Sun, Ce Ge, Junyan Wang, Ming Lin, Hesen Chen, Hao Li, Xiuyu Sun

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on three widely adopted benchmarks, Image Net, VWW and WIDER FACE, demonstrate that our method can achieve the state-of-the-art performance in the tiny deep model regime.
Researcher Affiliation Collaboration Zhenhong Sun1 Ce Ge1 Junyan Wang1,2, Ming Lin1,3 Hesen Chen1 Hao Li1 Xiuyu Sun1 1Alibaba Group 2University of New South Wales 3Amazon.com, Inc
Pseudocode Yes Appendix B contains "Algorithm 1 Quantization Bits Refinement" which is a structured algorithm block.
Open Source Code Yes Code and pre-trained models are available at https://github.com/alibaba/lightweight-neural-architecture-search.
Open Datasets Yes We use two standard benchmarks in this work: Image Net [8] and Visual Wake Words (VWW) [7]. ... we further evaluate it on the WIDER FACE [37] object detection dataset.
Dataset Splits Yes For Image Net-1K dataset, our models are trained for 240 epochs without special indication. ... (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] Present in Section 4, Section 5 and Appendix C.
Hardware Specification Yes Search uses 64 cores of Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz and training is conducted on 8 NVIDIA V100 GPUs.
Software Dependencies No The paper mentions general software like Python implicitly but does not provide specific version numbers for any key software components or libraries.
Experiment Setup Yes The evolutionary population N is set as 512 with total 500000 iterations. ... low-precision values are random selected from {2, 3, 4, 5, 6, 8} ... our models are trained for 240 epochs without special indication. All models are optimized by SGD with a batch size of 512 and Nesterov momentum factor of 0.9. Initial learning rate is set to 0.4 with cosine learning rate scheduling [21], and the weight decay is set to 4e-6.