Deep Generative Clustering with Multimodal Diffusion Variational Autoencoders
Authors: Emanuele Palumbo, Laura Manduchi, Sonia Laguna, Daphné Chopard, Julia E Vogt
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that our proposed model improves generative performance over existing multimodal VAEs, particularly for unconditional generation. Furthermore, we propose a post-hoc procedure to automatically select the number of true clusters thus mitigating critical limitations of previous clustering frameworks. Notably, our method favorably compares to alternative clustering approaches, in weakly-supervised settings. Finally, we integrate recent advancements in diffusion models into the proposed method to improve generative quality for real-world images. |
| Researcher Affiliation | Academia | Emanuele Palumbo1,2, Laura Manduchi2, Sonia Laguna2, Daphn e Chopard2,3 & Julia E. Vogt2 1 ETH AI Center 2 Department of Computer Science, ETH Zurich 3 Department of Intensive Care and Neonatology and Children s Research Center, University Children s Hospital Zurich, University of Zurich |
| Pseudocode | Yes | The proposed procedure is described in the pseudocode in Algorithm 1, and aims at obtaining a posterior cluster distribution p(c|z) where exactly K clusters have positive probability and each latent cluster correctly models a different true cluster of the data. |
| Open Source Code | Yes | We share the code for our model at https://github.com/epalu/CMVAE. |
| Open Datasets | Yes | We first validate our contributions on the Poly MNIST dataset (Sutter et al., 2021), a semi-synthetic five-modality dataset depicting MNIST (Le Cun et al., 2010) digits... We introduce a variation of the CUB Image-Captions dataset (Wah et al., 2011; Shi et al., 2019), which we name the CUB Image-Captions for Clustering (CUBICC) dataset. |
| Dataset Splits | Yes | For Poly MNIST there are 60000 samples for training, 5000 samples for validation, and 5000 samples for testing. For CUBICC, we have 11834 training, 638 validation, and 659 test samples. |
| Hardware Specification | No | The paper mentions "computational overhead" but does not specify any particular hardware components (e.g., CPU, GPU models, memory, or specific computing environments) used for running the experiments. |
| Software Dependencies | No | The paper mentions using ResNet architectures for encoders and decoders, but does not provide specific version numbers for any software, libraries, or frameworks used (e.g., PyTorch version, Python version, specific deep learning library versions). |
| Experiment Setup | Yes | CMVAE is trained for 250 epochs on this dataset, with 1e 3 learning rate. |