Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets

Authors: Penghang Yin, Jiancheng Lyu, Shuai Zhang, Stanley Osher, Yingyong Qi, Jack Xin

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Moreover, we show that a poor choice of STE leads to instability of the training algorithm near certain local minima, which is verified with CIFAR-10 experiments.
Researcher Affiliation Collaboration Department of Mathematics, University of California, Los Angeles yph@ucla.edu, sjo@math.ucla.edu Department of Mathematics, University of California, Irvine jianchel@uci.edu, jxin@math.uci.edu Qualcomm AI Research, San Diego {shuazhan,yingyong}@qti.qualcomm.com
Pseudocode Yes Algorithm 1 Coarse gradient descent for learning two-linear-layer CNN with STE µ.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets Yes In this section, we compare the performances of the identity, Re LU and clipped Re LU STEs on MNIST (Le Cun et al., 1998) and CIFAR-10 (Krizhevsky, 2009) benchmarks for 2-bit or 4-bit quantized activations.
Dataset Splits Yes The experimental results are summarized in Table 1, where we record both the training losses and validation accuracies. ... The schedule of the learning rate is specified in Table 2 in the appendix.
Hardware Specification No The paper mentions training on LeNet-5, VGG-11, and ResNet-20 models, but does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using 'stochastic (coarse) gradient descent with momentum = 0.9' and a 'modified batch normalization layer', but does not specify software names with version numbers (e.g., PyTorch, TensorFlow, or specific library versions).
Experiment Setup Yes The optimizer we use is the stochastic (coarse) gradient descent with momentum = 0.9 for all experiments. We train 50 epochs for Le Net-5... and 200 epochs for VGG-11 and Res Net-20... The schedule of the learning rate is specified in Table 2 in the appendix.