Multi-Facet Clustering Variational Autoencoders

Authors: Fabian Falck, Haoting Zhang, Matthew Willetts, George Nicholson, Christopher Yau, Chris C Holmes

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On image benchmarks, we demonstrate that our approach separates out and clusters over different aspects of the data in a disentangled manner. We demonstrate the usefulness of our model and its prior structure in four experimental analyses: (a) discovering a multi-facet structure (b) compositionality of latent facets (c) generative, unsupervised classification, and (d) diversity of generated samples from our model. We train our model on three image datasets: MNIST [5], 3DShapes (two configurations) [24] and SVHN [25].
Researcher Affiliation Academia 1University of Oxford 2University of Cambridge 3University College London 4University of Manchester 5Health Data Research UK 6The Alan Turing Institute
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes We also provide our code implementing MFCVAE, using Py Torch Distributions [26], and reproducing our results at https://github.com/Fabian Falck/mfcvae.
Open Datasets Yes We train our model on three image datasets: MNIST [5], 3DShapes (two configurations) [24] and SVHN [25].
Dataset Splits No For all datasets, we use the standard training and test splits.
Hardware Specification No All computational experiments were carried out on a cluster infrastructure. No specific hardware details (e.g., GPU/CPU models, memory) were provided.
Software Dependencies No The paper lists software packages like PyTorch, NumPy, Matplotlib, Seaborn, OpenCV, and Scikit-learn but does not provide specific version numbers for any of them.
Experiment Setup Yes We run each model for 500 epochs with Adam [60] with a learning rate of 1e-4 and a batch size of 128. We found this set of hyperparameters to work well across all datasets and settings for the number of facets and for each facet s number of components... We use a learning rate of 1e-4, and we warm-up the learning rate to 1e-4 over 1000 steps, and then linearly decay the learning rate to 0 over 500000 steps. We use a batch size of 128.