Per-Tensor Fixed-Point Quantization of the Back-Propagation Algorithm
Authors: Charbel Sakr, Naresh Shanbhag
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate FX training on three deep learning benchmarks (CIFAR-10, CIFAR-100, SVHN) achieving high fidelity to our FL baseline in that we observe no loss of accuracy higher then 0.56% in all of our experiments. Our precision assignment is further shown to be within 1-b per-tensor of the minimum. We show that our precision assignment methodology reduces representational, computational, and communication costs of training by up to 6 , 8 , and 4 , respectively, compared to the FL baseline and related works. Section 4 is titled "NUMERICAL RESULTS". |
| Researcher Affiliation | Academia | Charbel Sakr & Naresh Shanbhag Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Illinois, IL 61801, USA {sakr2,shanbhag}@illinois.edu |
| Pseudocode | No | The paper describes its methodology in text and mathematical formulas but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or provide links to a code repository for the described methodology. |
| Open Datasets | Yes | We employ three deep learning benchmarking datasets: CIFAR-10, CIFAR-100 (Krizhevsky and Hinton, 2009), and SVHN (Netzer et al., 2011). |
| Dataset Splits | No | Appendix E states, "The value of B(min) is swept and pm i evaluated on the validation set." However, the paper does not specify exact train/validation/test split percentages or sample counts, nor does it reference predefined splits for these datasets. |
| Hardware Specification | Yes | All experiments were done using a Pascal P100 NVIDIA GPU. |
| Software Dependencies | No | The paper mentions general computing environments like GPUs and CPUs and concepts like 32-bit floating-point arithmetic, but does not provide specific software names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x). |
| Experiment Setup | Yes | The precision configuration Co, with target pm 1%, β0 5%, and η0 1%, via our proposed method is depicted in Figure 2 for each of the four networks considered. ... The mini-batch size we used in all our experiments was 256. ... The smallest value of B(min) resulting in pm < 1% is equal to 4 bits. ... The smallest learning rate value used in the training, which in our case is 0.0001. |