Bridging Discrete and Backpropagation: Straight-Through and Beyond
Authors: Liyuan Liu, Chengyu Dong, Xiaodong Liu, Bin Yu, Jianfeng Gao
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results on various tasks demonstrate the superiority of Rein Max over the state of the art. |
| Researcher Affiliation | Industry | Liyuan Liu Chengyu Dong Xiaodong Liu Bin Yu Jianfeng Gao Microsoft Research {lucliu, v-chedong, xiaodl, v-ybi, jfgao}@microsoft.com |
| Pseudocode | Yes | Algorithm 1: ST. Input: θ: softmax input, τ: temperature. Output: D: one-hot samples. ... Algorithm 2: Rein Max. Input: θ: softmax input, τ: temperature. Output: D: one-hot samples. |
| Open Source Code | Yes | Implementations are available at https://github.com/microsoft/Rein Max. |
| Open Datasets | Yes | We benchmark the performance by training variational auto-encoders (VAE) with categorical latent variables on MNIST (Le Cun et al., 1998). ... Specifically we apply Rein Max to Bernoulli VAEs on MNIST, Fashion-MNIST (Xiao et al., 2017), and Omniglot(Lake et al., 2015), adhering closely to the experimental settings of Shi et al. (2022) |
| Dataset Splits | Yes | We also visualized the accuracy and loss on the valid set in Figure 3. ... Without specifically, we conduct full grid search for all methods in all experiments, and report the best performance (averaged with 10 random seeds on MNIST-VAE and 5 random seeds on List Ops). The hyper-parameter search space is summarized in Table 7. |
| Hardware Specification | Yes | Most experiments (except efficiency comparisons) are conducted on Nvidia P40 GPUs. For efficiency comparisons, we measured the average time cost per batch and peak memory consumption on quadratic programming and MNIST-VAE on the same system with an idle A6000 GPU. |
| Software Dependencies | No | The paper mentions 'Py Torch' as an automatic differentiation toolkit and specific optimizers (Adam, RAdam) but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Without specifically, we conduct full grid search for all methods in all experiments, and report the best performance (averaged with 10 random seeds on MNIST-VAE and 5 random seeds on List Ops). The hyper-parameter search space is summarized in Table 7. ... For our experiments on MNIST-VAE with 32 latent dimensions and 64 categorical dimensions, we set the batch size to 200, training steps to 5 105, and activation function to Leaky Re LU... For other experiments, we set the batch size to 100, the activation function to Re LU, and training steps to 9.6 104 (i.e., 160 epochs). |