Direct Optimization through $\arg \max$ for Discrete Variational Auto-Encoder

Authors: Guy Lorberbom, Andreea Gane, Tommi Jaakkola, Tamir Hazan

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate empirically the effectiveness of the direct loss minimization technique in variational autoencoders with both unstructured and structured discrete latent variables. We begin our experiments by comparing the test loss of direct optimization, the Gumbel-Softmax (GSM) and the unbiased gradient computation in Equation (2). We performed these experiments using the binarized MNIST dataset [33], Fashion-MNIST [40] and Omniglot [20].
Researcher Affiliation Academia Guy Lorberbom Technion Andreea Gane MIT Tommi Jaakkola MIT Tamir Hazan Technion
Pseudocode Yes Algorithm 1 Direct Optimization for discrete VAEs
Open Source Code No The paper cites external code repositories for other methods (e.g., REBAR, RELAX, ARM) but does not provide an explicit statement or link for the source code of their own proposed methodology.
Open Datasets Yes We performed these experiments using the binarized MNIST dataset [33], Fashion-MNIST [40] and Omniglot [20].
Dataset Splits No The paper mentions using binarized MNIST, Fashion-MNIST, and Omniglot datasets but does not explicitly provide details about specific training, validation, and test splits (e.g., percentages or sample counts for each split).
Hardware Specification No The paper does not provide specific hardware details such as CPU/GPU models, memory specifications, or cloud computing instance types used for running the experiments.
Software Dependencies Yes We use a general pairwise structured encoder where the arg max is recovered using the CPLEX algorithm [6]. IBM ILOG Cplex. V12. 1: User?s manual for cplex. International Business Machines Corporation, 46(53):157, 2009.
Experiment Setup Yes Following [12] we set our learning rate to 1e 3 and the annealing rate to 1e 5 and we used their annealing schedule every 1000 steps, setting the minimal ϵ to be 0.1. The architecture consists of an encoder X FC(300) Re LU FC(K), a matching decoder K FC(300) Re LU FC(X) and a BCE loss.