Unbiased learning of deep generative models with structured discrete representations

Authors: Henry C Bendekgey, Gabe Hope, Erik Sudderth

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare models via their test likelihoods, the quality of generated data, and the quality of interpolations. We consider joint positions from human motion capture data (MOCAP [10, 26]) and audio spectrograms from recordings of people reading Wall Street Journal headlines (WSJ0 [19]); see Table 2.
Researcher Affiliation Academia Harry Bendekgey, Gabriel Hope, Erik B. Sudderth {hbendekg, hopej, sudderth}@uci.edu Department of Computer Science, University of California, Irvine
Pseudocode Yes Algorithm 1 Structured Mean Field Variational Inference Require: graphical model potentials η, network potentials λ = encode(x; ϕ). (µ1, . . . , µM) init_expected_stats(η, λ) while not converged do for m 1 to M do ωm MF(µ m; η) µm BP(ωm; η, λ) return ω = concat(ω1, . . . , ωM)
Open Source Code Yes Code can be found at https://github.com/hbendekgey/SVAE-Implicit. All methods were implemented with the JAX library [8].
Open Datasets Yes We consider joint positions from human motion capture data (MOCAP [10, 26]) and audio spectrograms from recordings of people reading Wall Street Journal headlines (WSJ0 [19]); see Table 2.
Dataset Splits Yes In total, our training dataset consisted of 53,443 sequences, our validation set contained 2,752 sequences, and our test set contained 25,893 sequences.
Hardware Specification Yes The total amount of computation across both datasets amounts to 6 GPU days for A10G GPUs on an EC2 instance.
Software Dependencies No All methods were implemented with the JAX library [8]. While JAX is mentioned, no specific version number for JAX or other software dependencies (e.g., Python, PyTorch/TensorFlow, CUDA) is provided.
Experiment Setup Yes All experiments use latent space with dimension D = 16. We train using the Adam [31] optimizer for neural network parameters, and stochastic natural gradient descent for graphical model parameters, with a batch size of B = 128 and learning rate 10 3 (the transformer DVAE uses learning rate 10 4, which improved performance). We train all methods for 200 epochs on MOCAP and 100 epochs on WSJ0 (including VAE-pre-training for 10 epochs for SVAE/SIN methods), which we found was sufficient for convergence.