reproducibilityindex.ai

MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization

Authors: Shangyu Chen, Wenya Wang, Sinno Jialin Pan

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments are conducted with CIFAR10/100 and Image Net on various deep networks to demonstrate the advantage of our proposed method in terms of a faster convergence rate and better performance.
Researcher Affiliation	Academia	Shangyu Chen Nanyang Technological University, Singapore schen025@e.ntu.edu.sg Wenya Wang Nanyang Technological University, Singapore wangwy@ntu.edu.sg Sinno Jialin Pan Nanyang Technological University, Singapore sinnopan@ntu.edu.sg
Pseudocode	Yes	Algorithm 1 Meta Quant Require: Training dataset {x, y}n, well-trained full-precision base model W. Ensure: Quantized base model ˆ W. 1: Construct shared meta quantizer Mφ, training iteration t = 0. 2: while not optimal do 3: for Layer l from 1 to L do 4: ˆ Wt l = Q( Wt l) = Q A Wt 1 l α π(Mφ(gt 1 ˆ Wl , Wt 1 l ) Wt 1 l Wt 1 l ) 5: end for 6: Calculate loss: ℓ= Loss n f h Q( Wt); x i , y o 7: Generate g ˆ Wt using chain rules. 8: Calculate meta gradient g Wt using Mφ. 9: Calculate ℓ φt by (8) 10: for Layer l from 1 to L do 11: Wt l = Wt 1 l α π(Mφ(gt 1 ˆ Wl , Wt 1 l ) Wt 1 l Wt 1 l ) 12: end for 13: φt+1 = φt γ ℓ φt (γ is the learning rate of the meta quantizer) 14: t = t + 1 15: end while
Open Source Code	Yes	Codes are released at: https://github.com/csyhhu/Meta Quant
Open Datasets	Yes	Three benchmark datasets are used including Image Net ILSVRC-2012 and CIFAR10/100.
Dataset Splits	No	The paper mentions CIFAR10/100 and ImageNet but does not explicitly state train/validation/test splits, only that ImageNet ILSVRC-2012 is used, which implies standard splits are used, but no details are provided.
Hardware Specification	Yes	Meta Quant costs 51.15 seconds to ﬁnish one iteration of training while baseline method uses 38.17s. However, In real deployment meta quantizer is removed, Meta Quant is able to provide better test performance without any extra inference time. (Intel Xeon CPU E5-1650 with Ge Force GTX 750 Ti).
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies like Python, PyTorch, TensorFlow, or other libraries. It only mentions using 'SGD' or 'Adam' as optimization methods.
Experiment Setup	Yes	For experiments on CIFAR10/100, we set the initial learning rate as α = 1e 3 for base models and the initial learning rate as γ = 1e 3 for the meta quantizer. For fair comparison, we set total training epochs as 100 for all experiments, α and γ will be divided by 10 after every 30 epochs. For Image Net, the initial learning rate is set as α = 1e 4 for the base model using dorefa and BWN. Initial γ is set as 1e 3. α decreases to {1e 5, 1e 6} when training comes to 10 / 20 epochs. γ reduces to {1e 4, 1e 5} in accordance to the change of the learning rate in base models with total epoch as 30. Batch size is 128 for CIFAR/Image Net.