reproducibilityindex.ai

Adaptive Perturbation-Based Gradient Estimation for Discrete Latent Variable Models

Authors: Pasquale Minervini, Luca Franceschi, Mathias Niepert

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate our estimator on synthetic examples, as well as on Learning to Explain, Discrete Variational Auto-Encoders, and Neural Relational Inference tasks.
Researcher Affiliation	Academia	1 School of Informatics, University of Edinburgh, Edinburgh, United Kingdom 2 UCL Centre for Artificial Intelligence, London, United Kingdom 3 University of Stuttgart, Stuttgart, Germany
Pseudocode	Yes	Algorithm 1 Central Difference Perturbation-based Adaptive Implicit Maximum Likelihood Estimation (AIMLE).
Open Source Code	Yes	All source code and datasets are available at https://github.com/EdinburghNLP/torch-adaptive-imle.
Open Datasets	Yes	The BEERADVOCATE dataset (Mc Auley, Leskovec, and Jurafsky 2012) consists of free-text reviews and ratings for 4 different aspects of beer: appearance, aroma, palate, and taste.
Dataset Splits	Yes	Since the original dataset (Mc Auley, Leskovec, and Jurafsky 2012) did not provide separate validation and test sets, following Niepert, Minervini, and Franceschi (2021), we compute 10 different evenly sized validation and test splits of the 10,000 held out set and compute mean and standard deviation over 10 models, each trained on one split.
Hardware Specification	No	No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) were found for the experiments.
Software Dependencies	No	No specific software dependencies with version numbers were explicitly mentioned in the text.
Experiment Setup	Yes	In all our experiments, we fix the AIMLE hyper-parameters and use the target gradient norm c to c = 1, and the update step η to η = 10 3, based on the AIMLE implementation described in Algorithm 1. ... We trained separate models for each aspect using MSE as the training loss, using the Adam (Kingma and Ba 2015) optimiser with its default hyperparameters. ... The encoder and the decoder of the VAE consist of three dense layers, where the encoder and decoder activations have sizes 512-256-20 20, and 256-512-784, respectively.