Adaptive Perturbation-Based Gradient Estimation for Discrete Latent Variable Models

Authors: Pasquale Minervini, Luca Franceschi, Mathias Niepert

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate our estimator on synthetic examples, as well as on Learning to Explain, Discrete Variational Auto-Encoders, and Neural Relational Inference tasks.
Researcher Affiliation Academia 1 School of Informatics, University of Edinburgh, Edinburgh, United Kingdom 2 UCL Centre for Artificial Intelligence, London, United Kingdom 3 University of Stuttgart, Stuttgart, Germany
Pseudocode Yes Algorithm 1 Central Difference Perturbation-based Adaptive Implicit Maximum Likelihood Estimation (AIMLE).
Open Source Code Yes All source code and datasets are available at https://github.com/EdinburghNLP/torch-adaptive-imle.
Open Datasets Yes The BEERADVOCATE dataset (Mc Auley, Leskovec, and Jurafsky 2012) consists of free-text reviews and ratings for 4 different aspects of beer: appearance, aroma, palate, and taste.
Dataset Splits Yes Since the original dataset (Mc Auley, Leskovec, and Jurafsky 2012) did not provide separate validation and test sets, following Niepert, Minervini, and Franceschi (2021), we compute 10 different evenly sized validation and test splits of the 10,000 held out set and compute mean and standard deviation over 10 models, each trained on one split.
Hardware Specification No No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) were found for the experiments.
Software Dependencies No No specific software dependencies with version numbers were explicitly mentioned in the text.
Experiment Setup Yes In all our experiments, we fix the AIMLE hyper-parameters and use the target gradient norm c to c = 1, and the update step η to η = 10 3, based on the AIMLE implementation described in Algorithm 1. ... We trained separate models for each aspect using MSE as the training loss, using the Adam (Kingma and Ba 2015) optimiser with its default hyperparameters. ... The encoder and the decoder of the VAE consist of three dense layers, where the encoder and decoder activations have sizes 512-256-20 20, and 256-512-784, respectively.