SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization
Authors: Yuhta Takida, Takashi Shibuya, Weihsiang Liao, Chieh-Hsin Lai, Junki Ohmura, Toshimitsu Uesaka, Naoki Murata, Shusuke Takahashi, Toshiyuki Kumakura, Yuki Mitsufuji
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply SQ-VAE in several visionand speech-related tasks to demonstrate its improvement over the conventional VQ-VAE and VAE. All the experiments are repeated with three different random seeds, unless otherwise stated. |
| Researcher Affiliation | Industry | 1Sony Group Corporation, Japan 2Sony Corporation of America, USA. Correspondence to: Yuhta Takida <yuta.takida@sony.com>. |
| Pseudocode | Yes | The training procedures of Gaussian SQ-VAE and v MF SQ-VAE are described here in Algorithms 1 and 2, respectively. |
| Open Source Code | Yes | Our code is available at https://github.com/sony/sqvae. |
| Open Datasets | Yes | MNIST and Fashion-MNIST They contain 28 28 grayscale images, which are categorized into 10 classes. We use the default train/test split (60,000/10,000 samples) and further split 10,000 samples from the training set as the validation set. VCTK VCTK version 0.80 (Veaux et al., 2017)... Zero Speech 2019 Zero Speech 2019 English (Dunbar et al., 2019)... |
| Dataset Splits | Yes | MNIST and Fashion-MNIST ... We use the default train/test split (60,000/10,000 samples) and further split 10,000 samples from the training set as the validation set. Celeb A ... We use the default train/valid/test split. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments, such as CPU/GPU models or memory specifications. |
| Software Dependencies | No | The paper mentions several tools and frameworks like "Deep Mind sonnet" and "kan-bayashi/Parallel Wave GAN" but does not provide specific version numbers for these software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | We use the Adam optimizer (Kingma & Ba, 2015) with initial learning rates of 0.0003 and 0.001 for VQ-VAE and the other models, respectively. The learning rate will be halved every 3 epochs if the validation loss is not improving. We train 100 epochs with the minibatch size of 32 for MNIST, Fashion-MNIST, and CIFAR10 and 70 epochs for Celeb A. We set the VQ-VAE hyperparameter β in (4) and weight decay γ in EMA to 0.25 and 0.99, respectively... |