Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process
Authors: Tsai Hor Chan, Feng Wu, Yihang Chen, Guosheng Yin, Lequan Yu
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on several multimodal datasets demonstrate the superior performance of our model over other competitors. Ablation analysis further validates the effectiveness of DP in aligning modality distributions and its robustness to changes in key hyperparameters. |
| Researcher Affiliation | Academia | Tsai Hor Chan University of Pennsylvania Tsaihor.Chan@Penn Medicine.upenn.edu Feng Wu University of Hong Kong EMAIL Yihang Chen University of Hong Kong EMAIL Guosheng Yin University of Hong Kong EMAIL Lequan Yu University of Hong Kong EMAIL |
| Pseudocode | Yes | Algorithm 1 Sampling algorithm of our proposed DPMM framework. Input: Multimodal encoder fΘ( ) with parameter set Θ, concentration parameter η 1: Means and covariances of Gaussian distributions {(µmk,Σmk),m = 1,...,M,k = 1,...,K} 2: Training set Dtr = {x(i) 1 ,...,x(i) M ,y(i)}n i=1 Output: Trained fΘ 3: Initialize µmk = 0 and Σmk = I 4: for (x(i) 1 ,...,x(i) M ,y(i)) in Dtr do 5: Sample πmk with Eq. (1) 6: ˆy(i) = fΘ(x(i)), with x(i) = (x(i) 1 ,...,x(i) M ) Forward propagation 7: Compute task-specific loss Lobj with ˆy(i) and y(i) 8: Compute the KL(q p) and hence the ELBO 9: Back-propagate the ELBO to update Θ,µ,Σ 10: end for 11: Return: Trained fΘ |
| Open Source Code | Yes | Code is anonymously available at https://github.com/HKU-Med AI/DPMM.git |
| Open Datasets | Yes | We evaluate the performance of our DPMM on two multimodal large-scale clinical datasets MIMIC-III and MIMIC-IV, following the previous work [18]... Moreover, to show the generalization of our framework, we also conduct experiments on CMU-MOSI and POM datasets following the implementations of Liu et al. [32] |
| Dataset Splits | Yes | MIMIC-III This dataset includes 46,520 ICU stays, each containing 17 clinical variables. Following the methodology in [15], the dataset is divided into training, validation, and test sets using a 70%15%-15% split. MIMIC-IV This dataset comprises 21,139 ICU stays and includes 17 clinical variables. Following [18], the data are divided into 70% for training, 10% for validation, and 20% for testing. |
| Hardware Specification | Yes | All experiments are performed on a single RTX-3090 GPU. |
| Software Dependencies | Yes | DPMM is developed using Python 3.11 and Py Torch 1.9. In line with Med Fuse [18], Res Net34 [19] is employed as the backbone encoder for CXR images, a two-layer LSTM [14] is utilized for encoding time-series data, and pre-trained Tiny BERT [25]5 is adopted for clinical notes. |
| Experiment Setup | Yes | We train all models for 100 epochs using the training set, selecting the best-performing model based on validation AUROC. The Adam optimizer is utilized for optimization, and early stopping is employed if the validation AUROC shows no improvement over 15 consecutive epochs to mitigate overfitting... The batch size is configured as 32 for the MIMIC-IV & CXR datasets and 16 for the MIMIC-III & NOTE datasets... Hyperparameters are tuned using grid search on the validation set, and the test set results are based on the best configuration. The search space includes: Dropout ratio: {0,0.1,0.2,0.3} Learning rate: {1 10 4,5 10 5,1 10 5} Concentration rate η: {0.1,0.5,1,2,5} Number of mixture components K: {2,3,4,5} Temperature: {0.001,0.005,0.01,0.05,0.08} Regularization parameter λDP (adjusting the strength of DP assumption): {1 10 5,5 10 6,1 10 6} |