SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization

Authors: Yuhta Takida, Takashi Shibuya, Weihsiang Liao, Chieh-Hsin Lai, Junki Ohmura, Toshimitsu Uesaka, Naoki Murata, Shusuke Takahashi, Toshiyuki Kumakura, Yuki Mitsufuji

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply SQ-VAE in several visionand speech-related tasks to demonstrate its improvement over the conventional VQ-VAE and VAE. All the experiments are repeated with three different random seeds, unless otherwise stated.
Researcher Affiliation Industry 1Sony Group Corporation, Japan 2Sony Corporation of America, USA. Correspondence to: Yuhta Takida <yuta.takida@sony.com>.
Pseudocode Yes The training procedures of Gaussian SQ-VAE and v MF SQ-VAE are described here in Algorithms 1 and 2, respectively.
Open Source Code Yes Our code is available at https://github.com/sony/sqvae.
Open Datasets Yes MNIST and Fashion-MNIST They contain 28 28 grayscale images, which are categorized into 10 classes. We use the default train/test split (60,000/10,000 samples) and further split 10,000 samples from the training set as the validation set. VCTK VCTK version 0.80 (Veaux et al., 2017)... Zero Speech 2019 Zero Speech 2019 English (Dunbar et al., 2019)...
Dataset Splits Yes MNIST and Fashion-MNIST ... We use the default train/test split (60,000/10,000 samples) and further split 10,000 samples from the training set as the validation set. Celeb A ... We use the default train/valid/test split.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as CPU/GPU models or memory specifications.
Software Dependencies No The paper mentions several tools and frameworks like "Deep Mind sonnet" and "kan-bayashi/Parallel Wave GAN" but does not provide specific version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup Yes We use the Adam optimizer (Kingma & Ba, 2015) with initial learning rates of 0.0003 and 0.001 for VQ-VAE and the other models, respectively. The learning rate will be halved every 3 epochs if the validation loss is not improving. We train 100 epochs with the minibatch size of 32 for MNIST, Fashion-MNIST, and CIFAR10 and 70 epochs for Celeb A. We set the VQ-VAE hyperparameter β in (4) and weight decay γ in EMA to 0.25 and 0.99, respectively...