Learning Low-precision Neural Networks without Straight-Through Estimator (STE)
Authors: Zhi-Gang Liu, Matthew Mattina
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the (AB) method, a 1-bit Binary Net [Hubara et al., 2016a] on CIFAR10 dataset and 8-bits, 4-bits Mobile Net v1, Res Net 50 v1/2 on Image Net are trained using the alpha-blending approach, and the evaluation indicates that AB improves top-1 accuracy by 0.9%, 0.82% and 2.93% respectively compared to the results of STE based quantization [Hubara et al., 2016a] 1 [Krishnamoorthi, 2018] 2 . |
| Researcher Affiliation | Industry | Zhi-Gang Liu , Matthew Mattina Arm Machine Learning Research Lab {zhi-gang.liu, matthew.mattina}@arm.com |
| Pseudocode | Yes | Algorithm 1 Alpha-blending optimization (ABO); Algorithm 2 Progressive Project Quantization (PPQ) |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. Footnotes link to third-party models, frameworks, or datasets, not the authors' implementation of Alpha-Blending. |
| Open Datasets | Yes | To evaluate the (AB) method, a 1-bit Binary Net [Hubara et al., 2016a] on CIFAR10 dataset and 8-bits, 4-bits Mobile Net v1, Res Net 50 v1/2 on Image Net are trained using the alpha-blending approach |
| Dataset Splits | No | Figure 5 shows validation Loss and accuracy curves, indicating a validation set was used, but the paper does not specify the dataset split percentages or sample counts for training, validation, or testing, nor does it reference predefined splits with citations for these specific proportions. |
| Hardware Specification | Yes | All evaluations were performed on a x86 64 ubuntu Linux based Xeon server, Lenovo P710, with a Titan V GPU. |
| Software Dependencies | No | The paper mentions 'TensorFlow' being used for training but does not provide any specific version numbers for TensorFlow or other software libraries. |
| Experiment Setup | No | The paper describes the general training process for alpha-blending, including how alpha is gradually increased and that the optimization window [T0, T1] is a user-defined hyperparameter, and the learning rate is scaled by (1-alpha). However, it does not provide specific numerical values for these hyperparameters (e.g., T0, T1, initial learning rate, batch size, number of epochs, or specific optimizer configurations). |