Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Disentangled Cross-Modal Representation Learning with Enhanced Mutual Supervision

Authors: Lu Gao, Wenlan Chen, Daoyuan Wang, Fei Guo, Cheng Liang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results demonstrate significant improvements of our model over existing methods on various tasks including cross-modal generation, clustering and classification. ... We conduct an additional experiment on a spatial transcriptomics dataset of human breast cancer [39]. ... We compare our model with eight multimodal VAE baselines and five spatial domain clustering methods...
Researcher Affiliation Academia Lu Gao School of Computer Science and Engineering Central South University Changsha, China EMAIL Wenlan Chen School of Computer Science and Engineering Central South University Changsha, China EMAIL Daoyuan Wang School of Computer Science and Engineering Central South University Changsha, China EMAIL Fei Guo School of Computer Science and Engineering Central South University Changsha, China EMAIL Cheng Liang School of Information Science and Engineering Shandong Normal University Jinan, China EMAIL
Pseudocode Yes Algorithm 1: Optimization Procedure of DCMEM Input: Multimodal dataset: D = Ds Dt Ds,t; Training epochs number: M; Hyperparameters: α; Model parameters {Φ, Ψ} Output: Latent representation zs and zt
Open Source Code Yes We provide the complete source code along with instructions, including README files and environment settings in the supplementary material. All datasets used in our experiments are publicly available, and we include clear instructions for accessing and preprocessing the data.
Open Datasets Yes MNIST_SVHN: originally published in [8], downloaded the data from http://yann.lecun.com/exdb/mnist, http://ufldl.stanford.edu/housenumbers and the code from https://github.com/iffsid/mmvae, licensed under GPL 3.0. CUBICC: originally published in [12], downloaded the data from https://polybox.ethz.ch/index.php/s/LRkTC2oa6YHHlUj/download, published under the MIT license. Human Breast Cancer: originally published in [39], downloaded the data from https://www.10xgenomics.com/datasets/human-breast-cancer-block-a-section-1-1-standard1-0-0, published under the CC BY 4.0 license.
Dataset Splits Yes To assess the model s capability in handling partially observed datasets, we construct a set of incomplete bimodal datasets by randomly removing one modality at missing rates of η {0.25, 0.5, 0.75} and then train MVAE, Mo Po E, MEME, MVP as well as our method on these modified datasets.
Hardware Specification Yes All experiments are conducted on a machine equipped with an NVIDIA Ge Force RTX 2080 Ti GPU and 64 GB of RAM running Ubuntu 18.04.
Software Dependencies No All experiments are conducted on a machine equipped with an NVIDIA Ge Force RTX 2080 Ti GPU and 64 GB of RAM running Ubuntu 18.04.
Experiment Setup Yes For our model, we use a Res Net encoder and decoder for image data, and convolutional encoders and decoders for text data. The parameter α is set to 1. For the MNIST-SVHN dataset, the dimensions of the shared and specific latent spaces are set to 32. We use the Adam optimizer with a learning rate of 5e-4, a batch size of 64 and train the model for 100 epochs. For the CUBICC dataset, the dimensions of the shared and specific latent spaces are set to 48 and 16, respectively. The Adam optimizer is used with a learning rate of 1e-4, a batch size of 16 and training is conducted for 200 epochs. For the spatial transcriptomics dataset...Both the shared and specific latent dimensions are set to 32. Optimization is performed using Adam with a learning rate of 5e-4, a batch size of 64 and 100 training epochs.