Training Discrete Deep Generative Models via Gapped Straight-Through Estimator

Authors: Ting-Han Fan, Ta-Chung Chi, Alexander I. Rudnicky, Peter J Ramadge

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that the proposed GST estimator enjoys better performance compared to strong baselines on two discrete deep generative modeling tasks, MNIST-VAE and List Ops.
Researcher Affiliation Academia 1Department of Electrical and Computer Engineering, Princeton University, Princeton, NJ, USA 2Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA.
Pseudocode Yes Algorithm 1 The Proposed GST Estimator
Open Source Code Yes We release our code for both tasks at: https://github. com/chijames/GST.
Open Datasets Yes MNIST-VAE (Jang et al., 2017; Kingma et al., 2014) and List Ops unsupervised parsing (Nangia & Bowman, 2018).
Dataset Splits Yes To start with, we conduct an ablation study of the properties in 3. ... Table 2. Ablation study on the dev set. ... and test the model by replacing Xtrain with the testing data Xtest.
Hardware Specification Yes Table 5 is the computational efficiency comparison for the MNIST-VAE task on one Nvidia 1080-Ti GPU.
Software Dependencies No The paper mentions optimizer (Adam) but does not specify versions for software libraries or frameworks used.
Experiment Setup Yes For simplicity, we train all tasks using the same neural network structure, batch size (=100), epochs (=40), optimizer (Adam, learning rate=0.001) and seeds ( [0,1,...,9]).