Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Federated Dialogue-Semantic Diffusion for Emotion Recognition under Incomplete Modalities
Authors: Xihang Qiu, Jiarong Cheng, Yuhao Fang, Wanpeng Zhang, Yao Lu, Ye Zhang, Chun Li
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the IEMOCAP, CMUMOSI, and CMUMOSEI datasets demonstrate that Fed DISC achieves superior emotion classification performance across diverse missing modality patterns, outperforming existing approaches. |
| Researcher Affiliation | Academia | Xihang Qiu1,2 , Jiarong Cheng1,2 , Yuhao Fang1, Wanpeng Zhang2, Yao Lu1, Ye Zhang1,2, Chun Li1 1 Shenzhen MSU-BIT University 2 Beijing Institude of Technology EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Pre-training Process of DGN & SCN Modules Algorithm 2 illustrates the training process of Fed DISC, detailing how AFS alternately freezes the recovery and classifier modules to enable collaborative optimization. |
| Open Source Code | Yes | 3Code Repository: https://github.com/wdqdp/Fed DISC. |
| Open Datasets | Yes | To verify the effectiveness of Fed DISC, we conduct experiments on three benchmark conversational datasets: IEMOCAP [32], CMU-MOSI [33], and CMU-MOSEI [34]. |
| Dataset Splits | Yes | Dataset: As listed in Table 5, IEMOCAP4 includes four types of emotions: anger, happiness (where excitement is merged with happiness), sadness, and neutral [23]. We assign 3290, 1000, and 1241 utterances for train, valid, and test. The six-class dataset IEMOCAP6 encompasses: anger, happiness, sadness, neutral, excitement, and frustration. We assign 4810, 1000, and 1623 utterances for train, valid, and test. CMU-MOSI consists of 2199 utterances, where 1284, 299, 686 samples are set for train, valid, and test. CMU-MOSEI contains 22856 utterances, where 16326 are used for training, 1871 and 4659 samples are used for validation and testing. |
| Hardware Specification | Yes | All experiments were conducted on two NVIDIA L40S GPUs, each equipped with 48 GB of memory. |
| Software Dependencies | No | For each modality, we employ the corresponding pre-trained network to perform feature extraction. 1) Language: Pre-trained De BERTa [43] is employed as the language feature extractor. ... 2) Vision: The pre-trained MA-Net [44] serves as the visual feature extractor... 3) Acoustic: Pre-trained wav2vec [45] serves as the acoustic feature extractor... |
| Experiment Setup | Yes | For federated learning, we set the number of clients as nc = 3, each client evenly and randomly allocated all training, validation, and test data. During the training process, we set the local epoch e to 1, and the communication round E to 3, the window size w = 2. We perform five-fold cross-validation [12] and report the mean values on the test set. All experiments were conducted on two NVIDIA L40S GPUs, each equipped with 48 GB of memory. |