Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming
Authors: Xinlei Niu, Christian Walder, Jing Zhang, Charles Patrick Martin
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the usage of BDP in the latent space of variational autoencoders (VAEs) and propose the BDP-VAE which captures structured sparse optimal paths as latent variables. This enables end-to-end training for generative tasks in which models rely on unobserved structural information. At last, we validate the behavior of our approach and showcase its applicability in two real-world applications: text-to-speech and singing voice synthesis. |
| Researcher Affiliation | Collaboration | 1Australian National University, Canberra, Australia 2Google Deep Mind. |
| Pseudocode | Yes | Pseudocode to compute the location parameter µ and transition matrix π and sampling optimal path y are provided in Algorithm 1 and Algorithm 2. |
| Open Source Code | Yes | Our implementation code is available at https://github.com/Xinlei NIU/ Latent Optimal Paths Bayesian DP. |
| Open Datasets | Yes | We apply the BDP-VAE framework with the computational graph of MA to perform an end-to-end TTS model on the Ryan Speech (Zandie et al., 2021) dataset. ... We extend the BDP-VAE with MA for end-to-end SVS on the popcs dataset (Liu et al., 2022). ... we obtain latent optimal paths under the computational graph of DTW on the TIMIT speech corpus dataset (Garofolo et al., 1992) |
| Dataset Splits | Yes | We randomly split 2000 clips for validation and 9297 clips for training. |
| Hardware Specification | Yes | All experiments were performed on one NVIDIA Ge Force RTX 3090. |
| Software Dependencies | No | The paper mentions software like "pymcd.mcd" and the "Grinffin-Lim Algorithm" but does not specify version numbers for these or other key software components used in the experiments. |
| Experiment Setup | Yes | The model is trained by 18 batch size with a learning rate of 1.25e 4, temperature parameter α of 5 for 700k steps. |