MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization
Authors: Shangyu Chen, Wenya Wang, Sinno Jialin Pan
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted with CIFAR10/100 and Image Net on various deep networks to demonstrate the advantage of our proposed method in terms of a faster convergence rate and better performance. |
| Researcher Affiliation | Academia | Shangyu Chen Nanyang Technological University, Singapore schen025@e.ntu.edu.sg Wenya Wang Nanyang Technological University, Singapore wangwy@ntu.edu.sg Sinno Jialin Pan Nanyang Technological University, Singapore sinnopan@ntu.edu.sg |
| Pseudocode | Yes | Algorithm 1 Meta Quant Require: Training dataset {x, y}n, well-trained full-precision base model W. Ensure: Quantized base model ˆ W. 1: Construct shared meta quantizer Mφ, training iteration t = 0. 2: while not optimal do 3: for Layer l from 1 to L do 4: ˆ Wt l = Q( Wt l) = Q A Wt 1 l α π(Mφ(gt 1 ˆ Wl , Wt 1 l ) Wt 1 l Wt 1 l ) 5: end for 6: Calculate loss: ℓ= Loss n f h Q( Wt); x i , y o 7: Generate g ˆ Wt using chain rules. 8: Calculate meta gradient g Wt using Mφ. 9: Calculate ℓ φt by (8) 10: for Layer l from 1 to L do 11: Wt l = Wt 1 l α π(Mφ(gt 1 ˆ Wl , Wt 1 l ) Wt 1 l Wt 1 l ) 12: end for 13: φt+1 = φt γ ℓ φt (γ is the learning rate of the meta quantizer) 14: t = t + 1 15: end while |
| Open Source Code | Yes | Codes are released at: https://github.com/csyhhu/Meta Quant |
| Open Datasets | Yes | Three benchmark datasets are used including Image Net ILSVRC-2012 and CIFAR10/100. |
| Dataset Splits | No | The paper mentions CIFAR10/100 and ImageNet but does not explicitly state train/validation/test splits, only that ImageNet ILSVRC-2012 is used, which implies standard splits are used, but no details are provided. |
| Hardware Specification | Yes | Meta Quant costs 51.15 seconds to finish one iteration of training while baseline method uses 38.17s. However, In real deployment meta quantizer is removed, Meta Quant is able to provide better test performance without any extra inference time. (Intel Xeon CPU E5-1650 with Ge Force GTX 750 Ti). |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies like Python, PyTorch, TensorFlow, or other libraries. It only mentions using 'SGD' or 'Adam' as optimization methods. |
| Experiment Setup | Yes | For experiments on CIFAR10/100, we set the initial learning rate as α = 1e 3 for base models and the initial learning rate as γ = 1e 3 for the meta quantizer. For fair comparison, we set total training epochs as 100 for all experiments, α and γ will be divided by 10 after every 30 epochs. For Image Net, the initial learning rate is set as α = 1e 4 for the base model using dorefa and BWN. Initial γ is set as 1e 3. α decreases to {1e 5, 1e 6} when training comes to 10 / 20 epochs. γ reduces to {1e 4, 1e 5} in accordance to the change of the learning rate in base models with total epoch as 30. Batch size is 128 for CIFAR/Image Net. |