Vector Quantized Wasserstein Auto-Encoder

Authors: Long Tung Vuong, Trung Le, He Zhao, Chuanxia Zheng, Mehrtash Harandi, Jianfei Cai, Dinh Phung

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct comprehensive experiments to demonstrate our key contributions by comparing with VQ-VAE (Van Den Oord et al., 2017) and SQ-VAE (Takida et al., 2022) (i.e., the recent work that can improve the codebook utilization). The experimental results show that our VQ-WAE can achieve better codebook utilization with higher codebook perplexity, hence leading to lower (compared with VQ-VAE) or comparable (compared with SQ-VAE) reconstruction error, with significantly lower reconstructed Fr echlet Inception Distance (FID) score (Heusel et al., 2017).
Researcher Affiliation Collaboration 1Monash University, Australia 2Vinai, Vietnam 3CSIRO s Data61, Australia 4University of Oxford, United Kingdom.
Pseudocode Yes Algorithm 1 VQ-WAE
Open Source Code No The paper does not provide a direct link to a code repository or explicitly state that the source code for their method is released.
Open Datasets Yes Datasets: We empirically evaluate the proposed VQ-WAE in comparison with VQ-VAE (Van Den Oord et al., 2017) that is the baseline method, VQ-GAN (Esser et al., 2021) and recently proposed SQ-VAE (Takida et al., 2022) which is the state-of-the-art work of improving the codebook usage, on five different benchmark datasets: CIFAR10 (Van Den Oord et al., 2017), MNIST (Deng, 2012), SVHN (Netzer et al., 2011), Celeb A dataset (Liu et al., 2015; Takida et al., 2022) and the high-resolution images dataset FFHQ.
Dataset Splits No The paper mentions 'test-set reconstruction results' but does not explicitly provide details about training, validation, or test splits with percentages, absolute counts, or references to predefined standard splits.
Hardware Specification Yes Precisely on the system of a GPU NVIDIA Tesla V100 with dual CPUs Intel Xeon E5-2698 v4, training VQ-WAE takes about 64 seconds for one epoch on CIFAR10 dataset, while training a standard VQ-VAE only takes approximately 40 seconds for one epoch.
Software Dependencies No The paper mentions the use of an "adam optimizer" but does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages.
Experiment Setup Yes Additionally, in the primary setting, we use a codeword (discrete latent) dimensionality of 64 and codebook size |C| = 512 for all datasets except FFHQ, which has a codeword dimensionality of 256 and codebook size |C| = 1024, while the hyper-parameters {β, τ, λ} are specified as presented in the original papers, i.e., β = 0.25 for VQ-VAE and VQ-GAN (Esser et al., 2021), τ = 1e 5 for SQ-VAE and λ = 1e 3, λr = 1.0 for our VQ-WAE. The details of the experimental settings are presented in Appendix D.