Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets
Authors: Penghang Yin, Jiancheng Lyu, Shuai Zhang, Stanley Osher, Yingyong Qi, Jack Xin
ICLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Moreover, we show that a poor choice of STE leads to instability of the training algorithm near certain local minima, which is verified with CIFAR-10 experiments. |
| Researcher Affiliation | Collaboration | Department of Mathematics, University of California, Los Angeles EMAIL, EMAIL Department of Mathematics, University of California, Irvine EMAIL, EMAIL Qualcomm AI Research, San Diego EMAIL |
| Pseudocode | Yes | Algorithm 1 Coarse gradient descent for learning two-linear-layer CNN with STE µ. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | In this section, we compare the performances of the identity, Re LU and clipped Re LU STEs on MNIST (Le Cun et al., 1998) and CIFAR-10 (Krizhevsky, 2009) benchmarks for 2-bit or 4-bit quantized activations. |
| Dataset Splits | Yes | The experimental results are summarized in Table 1, where we record both the training losses and validation accuracies. ... The schedule of the learning rate is specified in Table 2 in the appendix. |
| Hardware Specification | No | The paper mentions training on LeNet-5, VGG-11, and ResNet-20 models, but does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'stochastic (coarse) gradient descent with momentum = 0.9' and a 'modified batch normalization layer', but does not specify software names with version numbers (e.g., PyTorch, TensorFlow, or specific library versions). |
| Experiment Setup | Yes | The optimizer we use is the stochastic (coarse) gradient descent with momentum = 0.9 for all experiments. We train 50 epochs for Le Net-5... and 200 epochs for VGG-11 and Res Net-20... The schedule of the learning rate is specified in Table 2 in the appendix. |