Direct Optimization through $\arg \max$ for Discrete Variational Auto-Encoder
Authors: Guy Lorberbom, Andreea Gane, Tommi Jaakkola, Tamir Hazan
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate empirically the effectiveness of the direct loss minimization technique in variational autoencoders with both unstructured and structured discrete latent variables. We begin our experiments by comparing the test loss of direct optimization, the Gumbel-Softmax (GSM) and the unbiased gradient computation in Equation (2). We performed these experiments using the binarized MNIST dataset [33], Fashion-MNIST [40] and Omniglot [20]. |
| Researcher Affiliation | Academia | Guy Lorberbom Technion Andreea Gane MIT Tommi Jaakkola MIT Tamir Hazan Technion |
| Pseudocode | Yes | Algorithm 1 Direct Optimization for discrete VAEs |
| Open Source Code | No | The paper cites external code repositories for other methods (e.g., REBAR, RELAX, ARM) but does not provide an explicit statement or link for the source code of their own proposed methodology. |
| Open Datasets | Yes | We performed these experiments using the binarized MNIST dataset [33], Fashion-MNIST [40] and Omniglot [20]. |
| Dataset Splits | No | The paper mentions using binarized MNIST, Fashion-MNIST, and Omniglot datasets but does not explicitly provide details about specific training, validation, and test splits (e.g., percentages or sample counts for each split). |
| Hardware Specification | No | The paper does not provide specific hardware details such as CPU/GPU models, memory specifications, or cloud computing instance types used for running the experiments. |
| Software Dependencies | Yes | We use a general pairwise structured encoder where the arg max is recovered using the CPLEX algorithm [6]. IBM ILOG Cplex. V12. 1: User?s manual for cplex. International Business Machines Corporation, 46(53):157, 2009. |
| Experiment Setup | Yes | Following [12] we set our learning rate to 1e 3 and the annealing rate to 1e 5 and we used their annealing schedule every 1000 steps, setting the minimal ϵ to be 0.1. The architecture consists of an encoder X FC(300) Re LU FC(K), a matching decoder K FC(300) Re LU FC(X) and a BCE loss. |