A Statistical Analysis of Wasserstein Autoencoders for Intrinsically Low-dimensional Data

Authors: Saptarshi Chakraborty, Peter Bartlett

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To bridge the gap between the theory and practice of WAEs, in this paper, we show that WAEs can learn the data distributions when the network architectures are properly chosen. We show that the convergence rates of the expected excess risk in the number of samples for WAEs are independent of the high feature dimension, instead relying only on the intrinsic dimension of the data distribution.
Researcher Affiliation Collaboration Saptarshi Chakraborty UC Berkeley saptarshic@berkeley.edu Peter L. Bartlett Google DeepMind & UC Berkeley peter@berkeley.edu
Pseudocode No The paper does not include pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes The codes for this experimental study can be found at https://github.com/Saptarshi C98/WAE.
Open Datasets Yes We use a pre-trained Bi-directional GAN (Donahue et al., 2017) with 128 latent entries and outputs of size 128 × 128 × 3, trained on the ImageNet dataset (Deng et al., 2009).
Dataset Splits No We train a WAE model with the standard architecture as proposed by Tolstikhin et al. (2018) with the number of training samples varying in {2000, 4000, . . . , 10000} and keep the last 1000 images for testing. The paper does not mention a validation split.
Hardware Specification No The paper mentions reducing image sizes "for computational ease" but does not specify any hardware (GPU, CPU models, etc.) used for experiments.
Software Dependencies No The paper mentions using "Adam optimizer (Kingma & Ba, 2015)" but does not specify any software names with version numbers for reproducibility.
Experiment Setup Yes For the latent distribution, we use the standard Gaussian distribution on the latent space R8 and use the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 0.0001. We also take λ = 10 for the penalty on the dissimilarity in objective (4). After training for 10 epochs, we generate 1000 sample images from the distribution ˆG ν (see Section 3 for notations) and compute the Frechet Inception Distance (FID) (Heusel et al., 2017) to assess the quality of the generated samples with respect to the target distribution.