Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Reconfigurable Representations for Multimodal Federated Learning with Missing Data
Authors: Duong Nguyen, Nghia Hoang, Thanh Trung Huynh, Quoc Viet Hung Nguyen, Phi Le Nguyen
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on multiple federated multimodal benchmarks with diverse data-missing patterns across clients demonstrate the efficacy of the proposed method, achieving up to 36.45% performance improvement under severe data incompleteness. The method is also supported by a theoretical analysis with an explicit performance bound that matches our empirical observations. Our source codes are provided at https://github.com/nmduonggg/PEPSY |
| Researcher Affiliation | Academia | Duong M. Nguyen University of Illinois Urbana-Champaign, US EMAIL Trong Nghia Hoang Washington State University, US EMAIL Thanh Trung Huynh Vin University, Vietnam EMAIL Quoc Viet Hung Nguyen Griffin University, Australia EMAIL Phi Le Nguyen Hanoi University of Science and Technology, Vietnam EMAIL |
| Pseudocode | No | The paper describes the methodology using textual explanations and figures, but does not include any explicit pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | Our source codes are provided at https://github.com/nmduonggg/PEPSY |
| Open Datasets | Yes | Our approach is evaluated on two datasets: PTBXL [53] (12 modalities) and Sleep-EDF [24] (5 modalities). ... PTB-XL, a large publicly available electrocardiography dataset. Scientific Data, 7(1):154, 2020. |
| Dataset Splits | Yes | Each dataset is split into 80% for training and 20% for testing, with the former distributed across K clients in both IID and Non-IID settings. Following [39], we define ps as the ratio of samples with missing modalities, and pm as the ratio of missing modalities within those samples2. The missing degree is then defined as pm ps, representing the overall proportion of instances with missing modalities. ... For Non IID setting, we use Dirichlet distribution with α = 0.5 to distribute training data points. |
| Hardware Specification | Yes | Experiments are run on an A6000 GPU with 48GB of memory. |
| Software Dependencies | No | The paper mentions using an "Inception Network as the modality encoder" and "Stochastic Gradient Descent (SGD) [49]" for optimization, but does not provide specific version numbers for any software libraries, frameworks, or programming languages. |
| Experiment Setup | Yes | The embedding dimension is set to C = 128. There are K = 32 clients in total, with 10 clients randomly selected to participate in each training round. Each selected client trains the model for E = 3 epochs per round. Optimization is done using Stochastic Gradient Descent (SGD) [49]. Communication with the server occurs over T = 1000 rounds. Both the alignment contrastive weight (λ) and the relevance regularization weight (η) are set to 0.1 for all experiments. However, λ is increased to 0.2 when pm ∈ {0.8, 1.0}, corresponding to extreme missing modality scenarios that require stronger alignment. Detailed hyperparameter settings are listed in Tab. 4. |