Unbiased learning of deep generative models with structured discrete representations
Authors: Henry C Bendekgey, Gabe Hope, Erik Sudderth
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare models via their test likelihoods, the quality of generated data, and the quality of interpolations. We consider joint positions from human motion capture data (MOCAP [10, 26]) and audio spectrograms from recordings of people reading Wall Street Journal headlines (WSJ0 [19]); see Table 2. |
| Researcher Affiliation | Academia | Harry Bendekgey, Gabriel Hope, Erik B. Sudderth {hbendekg, hopej, sudderth}@uci.edu Department of Computer Science, University of California, Irvine |
| Pseudocode | Yes | Algorithm 1 Structured Mean Field Variational Inference Require: graphical model potentials η, network potentials λ = encode(x; ϕ). (µ1, . . . , µM) init_expected_stats(η, λ) while not converged do for m 1 to M do ωm MF(µ m; η) µm BP(ωm; η, λ) return ω = concat(ω1, . . . , ωM) |
| Open Source Code | Yes | Code can be found at https://github.com/hbendekgey/SVAE-Implicit. All methods were implemented with the JAX library [8]. |
| Open Datasets | Yes | We consider joint positions from human motion capture data (MOCAP [10, 26]) and audio spectrograms from recordings of people reading Wall Street Journal headlines (WSJ0 [19]); see Table 2. |
| Dataset Splits | Yes | In total, our training dataset consisted of 53,443 sequences, our validation set contained 2,752 sequences, and our test set contained 25,893 sequences. |
| Hardware Specification | Yes | The total amount of computation across both datasets amounts to 6 GPU days for A10G GPUs on an EC2 instance. |
| Software Dependencies | No | All methods were implemented with the JAX library [8]. While JAX is mentioned, no specific version number for JAX or other software dependencies (e.g., Python, PyTorch/TensorFlow, CUDA) is provided. |
| Experiment Setup | Yes | All experiments use latent space with dimension D = 16. We train using the Adam [31] optimizer for neural network parameters, and stochastic natural gradient descent for graphical model parameters, with a batch size of B = 128 and learning rate 10 3 (the transformer DVAE uses learning rate 10 4, which improved performance). We train all methods for 200 epochs on MOCAP and 100 epochs on WSJ0 (including VAE-pre-training for 10 epochs for SVAE/SIN methods), which we found was sufficient for convergence. |