Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectives

Authors: Marcel Hirt, Domenico Campolo, Victoria Leong, Juan-Pablo Ortega

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our numerical experiments illustrate trade-offs for multi-modal variational objectives and various aggregation schemes.
Researcher Affiliation	Academia	Marcel Hirt EMAIL School of Social Sciences Nanyang Technological University Singapore Domenico Campolo EMAIL School of Mechanical and Aerospace Engineering Nanyang Technological University Singapore Victoria Leong Victoria EMAIL School of Social Sciences Nanyang Technological University Singapore Juan-Pablo Ortega EMAIL School of Physical and Mathematical Sciences Nanyang Technological University Singapore
Pseudocode	Yes	Algorithm 1 Single training step for computing unbiased gradients of L(x). Input: Multi-modal data point x, generative parameter θ, variational parameters ϕ = (φ, ϑ). Sample S ρ. Sample ϵS, ϵM p. Set z S = t S(ϕ, ϵS, x M) and z M = t M(ϕ, ϵM, x M). Stop gradients of variational parameters ϕ = stop_grad(ϕ). Set b LS(θ, ϕ) = log pθ(x S\|z S) + β log pθ(z S) β log qϕ (z S\|x S). Set b L\S(θ, ϕ) = log pθ(x\S\|z M) + β log qϕ(z M\|x S) β log qϕ (z M\|x M). Output: θ,ϕ h b LS(θ, ϕ) + b L\S(θ, ϕ) i
Open Source Code	Yes	A reference implementation is available at https://github.com/marcelah/Masked_Multimodal_VAE.
Open Datasets	Yes	Following previous work (Sutter et al., 2020; 2021; Javaloy et al., 2022), we consider a tri-modal dataset based on augmenting the MNIST-SVHN dataset (Shi et al., 2019) with a text-based modality.
Dataset Splits	Yes	The MNIST-SVHN-Text data set is taken from the code accompanying Sutter et al. (2021) with around 1.1 million train and 200k test samples.
Hardware Specification	Yes	All experiments except Section 5.3 were run on a CPU server using one or two CPU cores. The experiments in Section 5.3 were run on a GPU server using one NVIDIA A100.
Software Dependencies	Yes	Our implementation is based on JAX (Bradbury et al., 2018) and Flax (Heek et al., 2023).
Experiment Setup	Yes	All models are trained for 100 epochs with a batch size of 250 using Adam (Kingma and Ba, 2014) and a cosine decay schedule from 0.0005 to 0.0001.