Grammar Variational Autoencoder

Authors: Matt J. Kusner, Brooks Paige, José Miguel Hernández-Lobato

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the GVAE on two tasks for generating discrete data: 1) generating simple arithmetic expressions and 2) generating valid molecules. We show not only does our model produce a higher proportion of valid outputs than a character based autoencoder, it also produces smoother latent representations. We also show that this learned latent space is effective for searching for arithmetic expressions that fit data, for finding better drug-like molecules, and for making accurate predictions about target properties.
Researcher Affiliation Academia 1Alan Turing Institute 2University of Warwick 3University of Cambridge.
Pseudocode Yes Algorithm 1 Sampling from the decoder
Open Source Code Yes Code available at: https://github.com/mkusner/grammarVAE
Open Datasets Yes The training data for the CVAE and GVAE models are 250,000 SMILES strings (Weininger, 1988) extracted at random from the ZINC database by Gómez-Bombarelli et al. (2016b).
Dataset Splits No The paper mentions a 'left-out test set with 10% of the data' for evaluating the GP model, but does not explicitly specify a validation set for training the VAE or for hyperparameter tuning.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running the experiments.
Software Dependencies No The paper mentions types of neural networks (e.g., LSTMs, GRUs, DCNNs) and deep convolutional neural networks but does not provide specific software dependencies with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup No The paper describes the probabilistic setup for the VAE (e.g., 'q(z|X) is a Gaussian distribution whose mean and variance parameters are the output of the encoder network, with an isotropic Gaussian prior p(z) = N(0, I)') and optimization method ('gradient descent') but does not provide concrete hyperparameter values such as learning rate, batch size, or number of epochs for the VAE training. It also refers to supplementary material for network architecture details.