Learning Discrete Structured Variational Auto-Encoder using Natural Evolution Strategies
Authors: Alon Berliner, Guy Rotman, Yossi Adi, Roi Reichart, Tamir Hazan
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate empirically that optimizing discrete structured VAEs using NES is as effective as gradient-based approximations. Lastly, we prove NES converges for non-Lipschitz functions as appear in discrete structured VAEs. |
| Researcher Affiliation | Collaboration | Alon Berliner Technion, IIT alon.berliner@gmail.com Guy Rotman Technion, IIT rotmanguy@gmail.com Yossi Adi Meta AI Research adiyoss@fb.com Roi Reichart Technion, IIT roiri@technion.ac.il Tamir Hazan Technion, IIT tamir.hazan@technion.ac.il |
| Pseudocode | Yes | Algorithm 1 Natural Evolution Strategies for discrete VAEs |
| Open Source Code | No | The paper does not provide an explicit statement or link to the open-source code for the methodology described. |
| Open Datasets | Yes | In our experiments, we utilize the dataset developed by Paulus et al. [48]... We consider the Universal Dependencies (UD) dataset [35, 44, 45]... The experiments were conducted on the Fashion MNIST dataset [59] with fixed binarization [51]... Experiments are conducted on the Fashion MNIST [59], KMNIST [7], and Omniglot [29] datasets with fixed binarization [51]. |
| Dataset Splits | Yes | All reported values are measured on a test set, and the models were selected using early stopping on the validation set. |
| Hardware Specification | Yes | All the following experiments were conducted using an internal cluster with 4 Tesla-K80 NVIDIA GPUs. |
| Software Dependencies | No | The paper mentions software like |
| Experiment Setup | Yes | We run our experiments with the same set of parameters as in Paulus et al. [48], except that during decoding we use teacher-forcing every 3 steps instead of 9 steps. We fix NES parameters to be σ = 0.01 and N = 600... We set the hyper-parameters to those of the original implementation of Kiperwasser & Goldberg [22] and feed the models with the multilingual Fast Text word embeddings [16]. We perform a grid-search for each of the methods separately over learning rates in [5 10 4, 1 10 5] and set the mini-batch size to 128. We fix NES parameters to be σ = 0.1 and N = 400. Adam optimizer [21] is used to optimize all methods... All models were trained using the ADAM optimizer [21] over 300 epochs with a constant learning rate of 10 3 and a batch size of 128. |