Entropy-Driven Mixed-Precision Quantization for Deep Network Design
Authors: Zhenhong Sun, Ce Ge, Junyan Wang, Ming Lin, Hesen Chen, Hao Li, Xiuyu Sun
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on three widely adopted benchmarks, Image Net, VWW and WIDER FACE, demonstrate that our method can achieve the state-of-the-art performance in the tiny deep model regime. |
| Researcher Affiliation | Collaboration | Zhenhong Sun1 Ce Ge1 Junyan Wang1,2, Ming Lin1,3 Hesen Chen1 Hao Li1 Xiuyu Sun1 1Alibaba Group 2University of New South Wales 3Amazon.com, Inc |
| Pseudocode | Yes | Appendix B contains "Algorithm 1 Quantization Bits Refinement" which is a structured algorithm block. |
| Open Source Code | Yes | Code and pre-trained models are available at https://github.com/alibaba/lightweight-neural-architecture-search. |
| Open Datasets | Yes | We use two standard benchmarks in this work: Image Net [8] and Visual Wake Words (VWW) [7]. ... we further evaluate it on the WIDER FACE [37] object detection dataset. |
| Dataset Splits | Yes | For Image Net-1K dataset, our models are trained for 240 epochs without special indication. ... (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] Present in Section 4, Section 5 and Appendix C. |
| Hardware Specification | Yes | Search uses 64 cores of Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz and training is conducted on 8 NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper mentions general software like Python implicitly but does not provide specific version numbers for any key software components or libraries. |
| Experiment Setup | Yes | The evolutionary population N is set as 512 with total 500000 iterations. ... low-precision values are random selected from {2, 3, 4, 5, 6, 8} ... our models are trained for 240 epochs without special indication. All models are optimized by SGD with a batch size of 512 and Nesterov momentum factor of 0.9. Initial learning rate is set to 0.4 with cosine learning rate scheduling [21], and the weight decay is set to 4e-6. |