Training Discrete Deep Generative Models via Gapped Straight-Through Estimator
Authors: Ting-Han Fan, Ta-Chung Chi, Alexander I. Rudnicky, Peter J Ramadge
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that the proposed GST estimator enjoys better performance compared to strong baselines on two discrete deep generative modeling tasks, MNIST-VAE and List Ops. |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, Princeton University, Princeton, NJ, USA 2Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA. |
| Pseudocode | Yes | Algorithm 1 The Proposed GST Estimator |
| Open Source Code | Yes | We release our code for both tasks at: https://github. com/chijames/GST. |
| Open Datasets | Yes | MNIST-VAE (Jang et al., 2017; Kingma et al., 2014) and List Ops unsupervised parsing (Nangia & Bowman, 2018). |
| Dataset Splits | Yes | To start with, we conduct an ablation study of the properties in 3. ... Table 2. Ablation study on the dev set. ... and test the model by replacing Xtrain with the testing data Xtest. |
| Hardware Specification | Yes | Table 5 is the computational efficiency comparison for the MNIST-VAE task on one Nvidia 1080-Ti GPU. |
| Software Dependencies | No | The paper mentions optimizer (Adam) but does not specify versions for software libraries or frameworks used. |
| Experiment Setup | Yes | For simplicity, we train all tasks using the same neural network structure, batch size (=100), epochs (=40), optimizer (Adam, learning rate=0.001) and seeds ( [0,1,...,9]). |