$t^3$-Variational Autoencoder: Learning Heavy-tailed Data with Student's t and Power Divergence
Authors: Juno Kim, Jaehyuk Kwon, Mincheol Cho, Hyunjong Lee, Joong-Ho Won
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | t3VAE demonstrates superior generation of low-density regions when trained on heavytailed synthetic data. Furthermore, we show that t3VAE significantly outperforms other models on Celeb A and imbalanced CIFAR-100 datasets. |
| Researcher Affiliation | Academia | 1Department of Mathematical Informatics, The University of Tokyo 2Center for Advanced Intelligence Project, RIKEN 3Department of Statistics, Seoul National University |
| Pseudocode | Yes | A summary of our framework is provided in Algorithm 1 to assist with implementation. |
| Open Source Code | Yes | The code is available on Github. |
| Open Datasets | Yes | We now showcase the effectiveness of our model on high-dimensional data via both reconstruction and generation tasks in Celeb A (Liu et al., 2015; 2018)... we conduct reconstruction experiments with the CIFAR100-LT dataset (Cao et al., 2019), which is a long-tailed version of the original CIFAR-100 (Krizhevsky, 2009). |
| Dataset Splits | Yes | We first generate 200K train data, 200K validation data, and 500K test data from the heavy-tailed bimodal distribution (22). |
| Hardware Specification | Yes | All experiments are implemented via Python 3.8.10 with the Py Torch package (Paszke et al., 2019) version 1.13.1+cu117, and run on Linux Ubuntu 20.04 with Intel Xeon Silver 4114 @ 2.20GHz processors, an Nvidia Titan V GPU with 12GB memory, CUDA 11.3 and cu DNN 8.2. |
| Software Dependencies | Yes | All experiments are implemented via Python 3.8.10 with the Py Torch package (Paszke et al., 2019) version 1.13.1+cu117, and run on Linux Ubuntu 20.04 with Intel Xeon Silver 4114 @ 2.20GHz processors, an Nvidia Titan V GPU with 12GB memory, CUDA 11.3 and cu DNN 8.2. |
| Experiment Setup | Yes | In the training process, we use a batch size of 128 and employ the Adam optimizer (Kingma & Ba, 2014) with a learning rate of 1 10 3 and weight decay 1 10 4. Moreover, we adapt early stopping using validation data with patience 15 to prevent overfitting. All VAE models are trained for 50 epochs using a batch size of 128 and a latent variable dimension of 64 with the Adam optimizer. |