reproducibilityindex.ai

Deep Generative Clustering with Multimodal Diffusion Variational Autoencoders

Authors: Emanuele Palumbo, Laura Manduchi, Sonia Laguna, Daphné Chopard, Julia E Vogt

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that our proposed model improves generative performance over existing multimodal VAEs, particularly for unconditional generation. Furthermore, we propose a post-hoc procedure to automatically select the number of true clusters thus mitigating critical limitations of previous clustering frameworks. Notably, our method favorably compares to alternative clustering approaches, in weakly-supervised settings. Finally, we integrate recent advancements in diffusion models into the proposed method to improve generative quality for real-world images.
Researcher Affiliation	Academia	Emanuele Palumbo1,2, Laura Manduchi2, Sonia Laguna2, Daphn e Chopard2,3 & Julia E. Vogt2 1 ETH AI Center 2 Department of Computer Science, ETH Zurich 3 Department of Intensive Care and Neonatology and Children s Research Center, University Children s Hospital Zurich, University of Zurich
Pseudocode	Yes	The proposed procedure is described in the pseudocode in Algorithm 1, and aims at obtaining a posterior cluster distribution p(c\|z) where exactly K clusters have positive probability and each latent cluster correctly models a different true cluster of the data.
Open Source Code	Yes	We share the code for our model at https://github.com/epalu/CMVAE.
Open Datasets	Yes	We first validate our contributions on the Poly MNIST dataset (Sutter et al., 2021), a semi-synthetic five-modality dataset depicting MNIST (Le Cun et al., 2010) digits... We introduce a variation of the CUB Image-Captions dataset (Wah et al., 2011; Shi et al., 2019), which we name the CUB Image-Captions for Clustering (CUBICC) dataset.
Dataset Splits	Yes	For Poly MNIST there are 60000 samples for training, 5000 samples for validation, and 5000 samples for testing. For CUBICC, we have 11834 training, 638 validation, and 659 test samples.
Hardware Specification	No	The paper mentions "computational overhead" but does not specify any particular hardware components (e.g., CPU, GPU models, memory, or specific computing environments) used for running the experiments.
Software Dependencies	No	The paper mentions using ResNet architectures for encoders and decoders, but does not provide specific version numbers for any software, libraries, or frameworks used (e.g., PyTorch version, Python version, specific deep learning library versions).
Experiment Setup	Yes	CMVAE is trained for 250 epochs on this dataset, with 1e 3 learning rate.