reproducibilityindex.ai

Direct Optimization through $\arg \max$ for Discrete Variational Auto-Encoder

Authors: Guy Lorberbom, Andreea Gane, Tommi Jaakkola, Tamir Hazan

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate empirically the effectiveness of the direct loss minimization technique in variational autoencoders with both unstructured and structured discrete latent variables. We begin our experiments by comparing the test loss of direct optimization, the Gumbel-Softmax (GSM) and the unbiased gradient computation in Equation (2). We performed these experiments using the binarized MNIST dataset [33], Fashion-MNIST [40] and Omniglot [20].
Researcher Affiliation	Academia	Guy Lorberbom Technion Andreea Gane MIT Tommi Jaakkola MIT Tamir Hazan Technion
Pseudocode	Yes	Algorithm 1 Direct Optimization for discrete VAEs
Open Source Code	No	The paper cites external code repositories for other methods (e.g., REBAR, RELAX, ARM) but does not provide an explicit statement or link for the source code of their own proposed methodology.
Open Datasets	Yes	We performed these experiments using the binarized MNIST dataset [33], Fashion-MNIST [40] and Omniglot [20].
Dataset Splits	No	The paper mentions using binarized MNIST, Fashion-MNIST, and Omniglot datasets but does not explicitly provide details about specific training, validation, and test splits (e.g., percentages or sample counts for each split).
Hardware Specification	No	The paper does not provide specific hardware details such as CPU/GPU models, memory specifications, or cloud computing instance types used for running the experiments.
Software Dependencies	Yes	We use a general pairwise structured encoder where the arg max is recovered using the CPLEX algorithm [6]. IBM ILOG Cplex. V12. 1: User?s manual for cplex. International Business Machines Corporation, 46(53):157, 2009.
Experiment Setup	Yes	Following [12] we set our learning rate to 1e 3 and the annealing rate to 1e 5 and we used their annealing schedule every 1000 steps, setting the minimal ϵ to be 0.1. The architecture consists of an encoder X FC(300) Re LU FC(K), a matching decoder K FC(300) Re LU FC(X) and a BCE loss.