reproducibilityindex.ai

Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets

Authors: Penghang Yin, Jiancheng Lyu, Shuai Zhang, Stanley Osher, Yingyong Qi, Jack Xin

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Moreover, we show that a poor choice of STE leads to instability of the training algorithm near certain local minima, which is veriﬁed with CIFAR-10 experiments.
Researcher Affiliation	Collaboration	Department of Mathematics, University of California, Los Angeles yph@ucla.edu, sjo@math.ucla.edu Department of Mathematics, University of California, Irvine jianchel@uci.edu, jxin@math.uci.edu Qualcomm AI Research, San Diego {shuazhan,yingyong}@qti.qualcomm.com
Pseudocode	Yes	Algorithm 1 Coarse gradient descent for learning two-linear-layer CNN with STE µ.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets	Yes	In this section, we compare the performances of the identity, Re LU and clipped Re LU STEs on MNIST (Le Cun et al., 1998) and CIFAR-10 (Krizhevsky, 2009) benchmarks for 2-bit or 4-bit quantized activations.
Dataset Splits	Yes	The experimental results are summarized in Table 1, where we record both the training losses and validation accuracies. ... The schedule of the learning rate is speciﬁed in Table 2 in the appendix.
Hardware Specification	No	The paper mentions training on LeNet-5, VGG-11, and ResNet-20 models, but does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using 'stochastic (coarse) gradient descent with momentum = 0.9' and a 'modified batch normalization layer', but does not specify software names with version numbers (e.g., PyTorch, TensorFlow, or specific library versions).
Experiment Setup	Yes	The optimizer we use is the stochastic (coarse) gradient descent with momentum = 0.9 for all experiments. We train 50 epochs for Le Net-5... and 200 epochs for VGG-11 and Res Net-20... The schedule of the learning rate is speciﬁed in Table 2 in the appendix.