LEARNED STEP SIZE QUANTIZATION
Authors: Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani, Rathinakumar Appuswamy, Dharmendra S. Modha
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here, we present a method for training such networks, Learned Step Size Quantization, that achieves the highest accuracy to date on the Image Net dataset when using models, from a variety of architectures, with weights and activations quantized to 2-, 3or 4-bits of precision, and that can train 3-bit models that reach full precision baseline accuracy. Table 1: Comparison of low precision networks on Image Net. |
| Researcher Affiliation | Industry | Steven K. Esser , Jeffrey L. Mc Kinstry, Deepika Bablani, Rathinakumar Appuswamy, Dharmendra S. Modha IBM Research San Jose, California, USA |
| Pseudocode | Yes | In this section we provide pseudocode to facilitate the implementation of LSQ. |
| Open Source Code | No | The paper does not include an unambiguous statement that the authors are releasing the code for the work described in this paper, nor does it provide a direct link to a source-code repository. |
| Open Datasets | Yes | All experiments were conducted on the Image Net dataset (Russakovsky et al., 2015) |
| Dataset Splits | Yes | Images were resized to 256 256, then a 224 224 crop was selected for training, with horizontal mirroring applied half the time. At test time, a 224 224 centered crop was chosen. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used to run its experiments (e.g., specific GPU/CPU models or processor details). |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not provide specific version numbers for key software components or libraries. |
| Experiment Setup | Yes | Networks were trained with a momentum of 0.9, using a softmax cross entropy loss function, and cosine learning rate decay without restarts (Loshchilov & Hutter, 2016). ... The initial learning rate was set to 0.1 for full precision networks, 0.01 for 2-, 3-, and 4-bit networks and to 0.001 for 8-bit networks. |