Vector Quantized Wasserstein Auto-Encoder
Authors: Long Tung Vuong, Trung Le, He Zhao, Chuanxia Zheng, Mehrtash Harandi, Jianfei Cai, Dinh Phung
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive experiments to demonstrate our key contributions by comparing with VQ-VAE (Van Den Oord et al., 2017) and SQ-VAE (Takida et al., 2022) (i.e., the recent work that can improve the codebook utilization). The experimental results show that our VQ-WAE can achieve better codebook utilization with higher codebook perplexity, hence leading to lower (compared with VQ-VAE) or comparable (compared with SQ-VAE) reconstruction error, with significantly lower reconstructed Fr echlet Inception Distance (FID) score (Heusel et al., 2017). |
| Researcher Affiliation | Collaboration | 1Monash University, Australia 2Vinai, Vietnam 3CSIRO s Data61, Australia 4University of Oxford, United Kingdom. |
| Pseudocode | Yes | Algorithm 1 VQ-WAE |
| Open Source Code | No | The paper does not provide a direct link to a code repository or explicitly state that the source code for their method is released. |
| Open Datasets | Yes | Datasets: We empirically evaluate the proposed VQ-WAE in comparison with VQ-VAE (Van Den Oord et al., 2017) that is the baseline method, VQ-GAN (Esser et al., 2021) and recently proposed SQ-VAE (Takida et al., 2022) which is the state-of-the-art work of improving the codebook usage, on five different benchmark datasets: CIFAR10 (Van Den Oord et al., 2017), MNIST (Deng, 2012), SVHN (Netzer et al., 2011), Celeb A dataset (Liu et al., 2015; Takida et al., 2022) and the high-resolution images dataset FFHQ. |
| Dataset Splits | No | The paper mentions 'test-set reconstruction results' but does not explicitly provide details about training, validation, or test splits with percentages, absolute counts, or references to predefined standard splits. |
| Hardware Specification | Yes | Precisely on the system of a GPU NVIDIA Tesla V100 with dual CPUs Intel Xeon E5-2698 v4, training VQ-WAE takes about 64 seconds for one epoch on CIFAR10 dataset, while training a standard VQ-VAE only takes approximately 40 seconds for one epoch. |
| Software Dependencies | No | The paper mentions the use of an "adam optimizer" but does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages. |
| Experiment Setup | Yes | Additionally, in the primary setting, we use a codeword (discrete latent) dimensionality of 64 and codebook size |C| = 512 for all datasets except FFHQ, which has a codeword dimensionality of 256 and codebook size |C| = 1024, while the hyper-parameters {β, τ, λ} are specified as presented in the original papers, i.e., β = 0.25 for VQ-VAE and VQ-GAN (Esser et al., 2021), τ = 1e 5 for SQ-VAE and λ = 1e 3, λr = 1.0 for our VQ-WAE. The details of the experimental settings are presented in Appendix D. |