Adaptive Perturbation-Based Gradient Estimation for Discrete Latent Variable Models
Authors: Pasquale Minervini, Luca Franceschi, Mathias Niepert
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate our estimator on synthetic examples, as well as on Learning to Explain, Discrete Variational Auto-Encoders, and Neural Relational Inference tasks. |
| Researcher Affiliation | Academia | 1 School of Informatics, University of Edinburgh, Edinburgh, United Kingdom 2 UCL Centre for Artificial Intelligence, London, United Kingdom 3 University of Stuttgart, Stuttgart, Germany |
| Pseudocode | Yes | Algorithm 1 Central Difference Perturbation-based Adaptive Implicit Maximum Likelihood Estimation (AIMLE). |
| Open Source Code | Yes | All source code and datasets are available at https://github.com/EdinburghNLP/torch-adaptive-imle. |
| Open Datasets | Yes | The BEERADVOCATE dataset (Mc Auley, Leskovec, and Jurafsky 2012) consists of free-text reviews and ratings for 4 different aspects of beer: appearance, aroma, palate, and taste. |
| Dataset Splits | Yes | Since the original dataset (Mc Auley, Leskovec, and Jurafsky 2012) did not provide separate validation and test sets, following Niepert, Minervini, and Franceschi (2021), we compute 10 different evenly sized validation and test splits of the 10,000 held out set and compute mean and standard deviation over 10 models, each trained on one split. |
| Hardware Specification | No | No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) were found for the experiments. |
| Software Dependencies | No | No specific software dependencies with version numbers were explicitly mentioned in the text. |
| Experiment Setup | Yes | In all our experiments, we fix the AIMLE hyper-parameters and use the target gradient norm c to c = 1, and the update step η to η = 10 3, based on the AIMLE implementation described in Algorithm 1. ... We trained separate models for each aspect using MSE as the training loss, using the Adam (Kingma and Ba 2015) optimiser with its default hyperparameters. ... The encoder and the decoder of the VAE consist of three dense layers, where the encoder and decoder activations have sizes 512-256-20 20, and 256-512-784, respectively. |