reproducibilityindex.ai

LEARNED STEP SIZE QUANTIZATION

Authors: Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani, Rathinakumar Appuswamy, Dharmendra S. Modha

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here, we present a method for training such networks, Learned Step Size Quantization, that achieves the highest accuracy to date on the Image Net dataset when using models, from a variety of architectures, with weights and activations quantized to 2-, 3or 4-bits of precision, and that can train 3-bit models that reach full precision baseline accuracy. Table 1: Comparison of low precision networks on Image Net.
Researcher Affiliation	Industry	Steven K. Esser , Jeffrey L. Mc Kinstry, Deepika Bablani, Rathinakumar Appuswamy, Dharmendra S. Modha IBM Research San Jose, California, USA
Pseudocode	Yes	In this section we provide pseudocode to facilitate the implementation of LSQ.
Open Source Code	No	The paper does not include an unambiguous statement that the authors are releasing the code for the work described in this paper, nor does it provide a direct link to a source-code repository.
Open Datasets	Yes	All experiments were conducted on the Image Net dataset (Russakovsky et al., 2015)
Dataset Splits	Yes	Images were resized to 256 256, then a 224 224 crop was selected for training, with horizontal mirroring applied half the time. At test time, a 224 224 centered crop was chosen.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used to run its experiments (e.g., specific GPU/CPU models or processor details).
Software Dependencies	No	The paper mentions 'Py Torch' but does not provide specific version numbers for key software components or libraries.
Experiment Setup	Yes	Networks were trained with a momentum of 0.9, using a softmax cross entropy loss function, and cosine learning rate decay without restarts (Loshchilov & Hutter, 2016). ... The initial learning rate was set to 0.1 for full precision networks, 0.01 for 2-, 3-, and 4-bit networks and to 0.001 for 8-bit networks.