Learning Low-precision Neural Networks without Straight-Through Estimator (STE)

Authors: Zhi-Gang Liu, Matthew Mattina

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the (AB) method, a 1-bit Binary Net [Hubara et al., 2016a] on CIFAR10 dataset and 8-bits, 4-bits Mobile Net v1, Res Net 50 v1/2 on Image Net are trained using the alpha-blending approach, and the evaluation indicates that AB improves top-1 accuracy by 0.9%, 0.82% and 2.93% respectively compared to the results of STE based quantization [Hubara et al., 2016a] 1 [Krishnamoorthi, 2018] 2 .
Researcher Affiliation Industry Zhi-Gang Liu , Matthew Mattina Arm Machine Learning Research Lab {zhi-gang.liu, matthew.mattina}@arm.com
Pseudocode Yes Algorithm 1 Alpha-blending optimization (ABO); Algorithm 2 Progressive Project Quantization (PPQ)
Open Source Code No The paper does not provide concrete access to source code for the methodology described. Footnotes link to third-party models, frameworks, or datasets, not the authors' implementation of Alpha-Blending.
Open Datasets Yes To evaluate the (AB) method, a 1-bit Binary Net [Hubara et al., 2016a] on CIFAR10 dataset and 8-bits, 4-bits Mobile Net v1, Res Net 50 v1/2 on Image Net are trained using the alpha-blending approach
Dataset Splits No Figure 5 shows validation Loss and accuracy curves, indicating a validation set was used, but the paper does not specify the dataset split percentages or sample counts for training, validation, or testing, nor does it reference predefined splits with citations for these specific proportions.
Hardware Specification Yes All evaluations were performed on a x86 64 ubuntu Linux based Xeon server, Lenovo P710, with a Titan V GPU.
Software Dependencies No The paper mentions 'TensorFlow' being used for training but does not provide any specific version numbers for TensorFlow or other software libraries.
Experiment Setup No The paper describes the general training process for alpha-blending, including how alpha is gradually increased and that the optimization window [T0, T1] is a user-defined hyperparameter, and the learning rate is scaled by (1-alpha). However, it does not provide specific numerical values for these hyperparameters (e.g., T0, T1, initial learning rate, batch size, number of epochs, or specific optimizer configurations).