Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectives

Authors: Marcel Hirt, Domenico Campolo, Victoria Leong, Juan-Pablo Ortega

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our numerical experiments illustrate trade-offs for multi-modal variational objectives and various aggregation schemes.
Researcher Affiliation Academia Marcel Hirt EMAIL School of Social Sciences Nanyang Technological University Singapore Domenico Campolo EMAIL School of Mechanical and Aerospace Engineering Nanyang Technological University Singapore Victoria Leong Victoria EMAIL School of Social Sciences Nanyang Technological University Singapore Juan-Pablo Ortega EMAIL School of Physical and Mathematical Sciences Nanyang Technological University Singapore
Pseudocode Yes Algorithm 1 Single training step for computing unbiased gradients of L(x). Input: Multi-modal data point x, generative parameter θ, variational parameters ϕ = (φ, ϑ). Sample S ρ. Sample ϵS, ϵM p. Set z S = t S(ϕ, ϵS, x M) and z M = t M(ϕ, ϵM, x M). Stop gradients of variational parameters ϕ = stop_grad(ϕ). Set b LS(θ, ϕ) = log pθ(x S|z S) + β log pθ(z S) β log qϕ (z S|x S). Set b L\S(θ, ϕ) = log pθ(x\S|z M) + β log qϕ(z M|x S) β log qϕ (z M|x M). Output: θ,ϕ h b LS(θ, ϕ) + b L\S(θ, ϕ) i
Open Source Code Yes A reference implementation is available at https://github.com/marcelah/Masked_Multimodal_VAE.
Open Datasets Yes Following previous work (Sutter et al., 2020; 2021; Javaloy et al., 2022), we consider a tri-modal dataset based on augmenting the MNIST-SVHN dataset (Shi et al., 2019) with a text-based modality.
Dataset Splits Yes The MNIST-SVHN-Text data set is taken from the code accompanying Sutter et al. (2021) with around 1.1 million train and 200k test samples.
Hardware Specification Yes All experiments except Section 5.3 were run on a CPU server using one or two CPU cores. The experiments in Section 5.3 were run on a GPU server using one NVIDIA A100.
Software Dependencies Yes Our implementation is based on JAX (Bradbury et al., 2018) and Flax (Heek et al., 2023).
Experiment Setup Yes All models are trained for 100 epochs with a batch size of 250 using Adam (Kingma and Ba, 2014) and a cosine decay schedule from 0.0005 to 0.0001.