$t^3$-Variational Autoencoder: Learning Heavy-tailed Data with Student's t and Power Divergence

Authors: Juno Kim, Jaehyuk Kwon, Mincheol Cho, Hyunjong Lee, Joong-Ho Won

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental t3VAE demonstrates superior generation of low-density regions when trained on heavytailed synthetic data. Furthermore, we show that t3VAE significantly outperforms other models on Celeb A and imbalanced CIFAR-100 datasets.
Researcher Affiliation Academia 1Department of Mathematical Informatics, The University of Tokyo 2Center for Advanced Intelligence Project, RIKEN 3Department of Statistics, Seoul National University
Pseudocode Yes A summary of our framework is provided in Algorithm 1 to assist with implementation.
Open Source Code Yes The code is available on Github.
Open Datasets Yes We now showcase the effectiveness of our model on high-dimensional data via both reconstruction and generation tasks in Celeb A (Liu et al., 2015; 2018)... we conduct reconstruction experiments with the CIFAR100-LT dataset (Cao et al., 2019), which is a long-tailed version of the original CIFAR-100 (Krizhevsky, 2009).
Dataset Splits Yes We first generate 200K train data, 200K validation data, and 500K test data from the heavy-tailed bimodal distribution (22).
Hardware Specification Yes All experiments are implemented via Python 3.8.10 with the Py Torch package (Paszke et al., 2019) version 1.13.1+cu117, and run on Linux Ubuntu 20.04 with Intel Xeon Silver 4114 @ 2.20GHz processors, an Nvidia Titan V GPU with 12GB memory, CUDA 11.3 and cu DNN 8.2.
Software Dependencies Yes All experiments are implemented via Python 3.8.10 with the Py Torch package (Paszke et al., 2019) version 1.13.1+cu117, and run on Linux Ubuntu 20.04 with Intel Xeon Silver 4114 @ 2.20GHz processors, an Nvidia Titan V GPU with 12GB memory, CUDA 11.3 and cu DNN 8.2.
Experiment Setup Yes In the training process, we use a batch size of 128 and employ the Adam optimizer (Kingma & Ba, 2014) with a learning rate of 1 10 3 and weight decay 1 10 4. Moreover, we adapt early stopping using validation data with patience 15 to prevent overfitting. All VAE models are trained for 50 epochs using a batch size of 128 and a latent variable dimension of 64 with the Adam optimizer.