Tree Variational Autoencoders

Authors: Laura Manduchi, Moritz Vandenhirtz, Alain Ryser, Julia Vogt

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present empirically that Tree VAE provides a more competitive log-likelihood lower bound than the sequential counterparts. Finally, due to its generative nature, Tree VAE is able to generate new samples from the discovered clusters via conditional sampling. Our main contributions are as follows: (i) We propose a novel, deep probabilistic approach to hierarchical clustering that learns the optimal generative binary tree to mimic the hierarchies present in the data. (ii) We provide a thorough empirical assessment of the proposed approach on MNIST, Fashion-MNIST, 20Newsgroups, and Omniglot. In particular, we show that Tree VAE (a) outperforms related work on deep hierarchical clustering, (b) discovers meaningful patterns in the data and their hierarchical relationships, and (c) achieves a more competitive log-likelihood lower bound compared to VAE and Ladder VAE, its sequential counterpart.
Researcher Affiliation Academia Laura Manduchi , Moritz Vandenhirtz , Alain Ryser, Julia E. Vogt Department of Computer Science ETH Zurich Switzerland
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It describes the model and training process in text and mathematical formulas.
Open Source Code Yes The code is publicly available at https://github.com/lauramanduchi/treevae-pytorch.
Open Datasets Yes Datasets and Metrics: We evaluate the clustering and generative performance of Tree VAE on MNIST (Le Cun et al., 1998), Fashion-MNIST (H. Xiao et al., 2017), 20Newsgroups (Lang, 1995), Omniglot (Lake et al., 2015), and Omniglot-5, where only 5 vocabularies (Braille, Glagolitic, Cyrillic, Odia, and Bengali) are selected and used as true labels. We also perform hierarchical clustering experiments on real-world imaging data, namely CIFAR-10, CIFAR-100 (Krizhevsky & Hinton, 2009) with 20 superclasses as labels, and Celeb A (Z. Liu et al., 2015) using the contrastive extension (Sec. 2.6).
Dataset Splits Yes The MNIST (Le Cun et al., 1998) dataset... with a total of 60 000 training images and 10 000 test images. The Fashion-MNIST (H. Xiao et al., 2017) dataset... includes 60, 000 training images and 10 000 test images. For both dataset versions and all experiments, we split the dataset into train/test splits with 80%/20% of the samples, respectively, and stratifying across the characters of the dataset. The CIFAR-10 (Krizhevsky & Hinton, 2009) dataset... consists of 50 000 training and 10 000 test images... The CIFAR-100 (Krizhevsky & Hinton, 2009) dataset... consist of 50 000 training and 10 000 test images... For training, we select a subset of 100 000 random images from the training set and evaluate on the given test set of 19 962 images.
Hardware Specification Yes All our experiments were run on RTX3080 GPUs, except for Celeb A, where we increased the memory requirement and use a RTX3090.
Software Dependencies No The paper does not provide specific software dependencies with version numbers. It mentions the use of 'Pytorch' in the code repository link, but no specific version is given in the text.
Experiment Setup Yes The trees are trained for Nt = 150 epochs at each growth step, and the final tree is finetuned for Nf = 200 epochs. For the real-world imaging experiments, we set the weight of the contrastive loss to 100. To reduce the risk of posterior collapse during training, we anneal the KL terms of the ELBO. Starting from 0, we increase the weight of the KL terms every epoch by 0.001 except for the final finetuning of the full tree. Here, we linearly increase the weight by 0.01 until we reach 1, such that the final model is trained on the complete ELBO.