A Statistical Analysis of Wasserstein Autoencoders for Intrinsically Low-dimensional Data
Authors: Saptarshi Chakraborty, Peter Bartlett
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To bridge the gap between the theory and practice of WAEs, in this paper, we show that WAEs can learn the data distributions when the network architectures are properly chosen. We show that the convergence rates of the expected excess risk in the number of samples for WAEs are independent of the high feature dimension, instead relying only on the intrinsic dimension of the data distribution. |
| Researcher Affiliation | Collaboration | Saptarshi Chakraborty UC Berkeley saptarshic@berkeley.edu Peter L. Bartlett Google DeepMind & UC Berkeley peter@berkeley.edu |
| Pseudocode | No | The paper does not include pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | The codes for this experimental study can be found at https://github.com/Saptarshi C98/WAE. |
| Open Datasets | Yes | We use a pre-trained Bi-directional GAN (Donahue et al., 2017) with 128 latent entries and outputs of size 128 × 128 × 3, trained on the ImageNet dataset (Deng et al., 2009). |
| Dataset Splits | No | We train a WAE model with the standard architecture as proposed by Tolstikhin et al. (2018) with the number of training samples varying in {2000, 4000, . . . , 10000} and keep the last 1000 images for testing. The paper does not mention a validation split. |
| Hardware Specification | No | The paper mentions reducing image sizes "for computational ease" but does not specify any hardware (GPU, CPU models, etc.) used for experiments. |
| Software Dependencies | No | The paper mentions using "Adam optimizer (Kingma & Ba, 2015)" but does not specify any software names with version numbers for reproducibility. |
| Experiment Setup | Yes | For the latent distribution, we use the standard Gaussian distribution on the latent space R8 and use the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 0.0001. We also take λ = 10 for the penalty on the dissimilarity in objective (4). After training for 10 epochs, we generate 1000 sample images from the distribution ˆG ν (see Section 3 for notations) and compute the Frechet Inception Distance (FID) (Heusel et al., 2017) to assess the quality of the generated samples with respect to the target distribution. |