Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Coupled Gradient Estimators for Discrete Latent Variables
Authors: Zhe Dong, Andriy Mnih, George Tucker
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In systematic experiments, we show that our proposed categorical gradient estimators provide state-of-the-art performance, whereas even with additional Rao-Blackwellization, previous estimators (Yin et al., 2019) underperform a simpler REINFORCE with a leave-one-out-baseline estimator (Kool et al., 2019). |
| Researcher Affiliation | Industry | Zhe Dong Google Research, Brain Team EMAIL Andriy Mnih Deep Mind EMAIL George Tucker Google Research, Brain Team EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and additional information: https://sites.google.com/view/disarm-estimator. |
| Open Datasets | Yes | on three datasets: binarized MNIST (Le Cun et al., 2010), Fashion MNIST (Xiao et al., 2017), and Omniglot (Lake et al., 2015). |
| Dataset Splits | Yes | We use the standard split into train, validation, and test sets. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions TensorFlow Probability without a version number and does not list other specific software dependencies with version numbers. |
| Experiment Setup | Yes | For most experiments, we used 32 latent variables with 64 categories unless specified otherwise. See Appendix A.2 for more details. |