Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning Reconfigurable Representations for Multimodal Federated Learning with Missing Data

Authors: Duong Nguyen, Nghia Hoang, Thanh Trung Huynh, Quoc Viet Hung Nguyen, Phi Le Nguyen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results on multiple federated multimodal benchmarks with diverse data-missing patterns across clients demonstrate the efficacy of the proposed method, achieving up to 36.45% performance improvement under severe data incompleteness. The method is also supported by a theoretical analysis with an explicit performance bound that matches our empirical observations. Our source codes are provided at https://github.com/nmduonggg/PEPSY
Researcher Affiliation Academia Duong M. Nguyen University of Illinois Urbana-Champaign, US EMAIL Trong Nghia Hoang Washington State University, US EMAIL Thanh Trung Huynh Vin University, Vietnam EMAIL Quoc Viet Hung Nguyen Griffin University, Australia EMAIL Phi Le Nguyen Hanoi University of Science and Technology, Vietnam EMAIL
Pseudocode No The paper describes the methodology using textual explanations and figures, but does not include any explicit pseudocode blocks or algorithm listings.
Open Source Code Yes Our source codes are provided at https://github.com/nmduonggg/PEPSY
Open Datasets Yes Our approach is evaluated on two datasets: PTBXL [53] (12 modalities) and Sleep-EDF [24] (5 modalities). ... PTB-XL, a large publicly available electrocardiography dataset. Scientific Data, 7(1):154, 2020.
Dataset Splits Yes Each dataset is split into 80% for training and 20% for testing, with the former distributed across K clients in both IID and Non-IID settings. Following [39], we define ps as the ratio of samples with missing modalities, and pm as the ratio of missing modalities within those samples2. The missing degree is then defined as pm ps, representing the overall proportion of instances with missing modalities. ... For Non IID setting, we use Dirichlet distribution with α = 0.5 to distribute training data points.
Hardware Specification Yes Experiments are run on an A6000 GPU with 48GB of memory.
Software Dependencies No The paper mentions using an "Inception Network as the modality encoder" and "Stochastic Gradient Descent (SGD) [49]" for optimization, but does not provide specific version numbers for any software libraries, frameworks, or programming languages.
Experiment Setup Yes The embedding dimension is set to C = 128. There are K = 32 clients in total, with 10 clients randomly selected to participate in each training round. Each selected client trains the model for E = 3 epochs per round. Optimization is done using Stochastic Gradient Descent (SGD) [49]. Communication with the server occurs over T = 1000 rounds. Both the alignment contrastive weight (λ) and the relevance regularization weight (η) are set to 0.1 for all experiments. However, λ is increased to 0.2 when pm ∈ {0.8, 1.0}, corresponding to extreme missing modality scenarios that require stronger alignment. Detailed hyperparameter settings are listed in Tab. 4.