Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
An Optimal Transport-based Latent Mixer for Robust Multi-modal Learning
Authors: Fengjiiao Gong, Angxiao Yue, Hongteng Xu
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on multi-modal clustering and classification demonstrate that the models learned with the OTM method outperform the corresponding baselines. Experiments on multi-modal clustering and classification demonstrate the effectiveness of our method compared with the existing baselines, especially in the unaligned multi-modal scenarios. |
| Researcher Affiliation | Academia | 1Gaoling School of Artifical Intelligence, Renmin University of China, Beijing, China 2Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China |
| Pseudocode | Yes | Algorithm 1: Computation of FGW distance Algorithm 2: Computation of FGW barycenter |
| Open Source Code | Yes | The code and more experimental results can be found at https://github.com/red Linmumu/OTM. |
| Open Datasets | Yes | For clustering tasks, we conduct the experiments on four conventional multi-modal dataset used in (Hu, Nie, and Li 2019; Guo et al. 2014; Gong, Nie, and Xu 2022). Each dataset contains well-aligned samples and corresponding labels, which are used only in the validation stage. The dataset for the classification and regression tasks are chosen from Multibench (Liang et al. 2021), which is a wellknown systematic large-scale multi-modal learning benchmark. |
| Dataset Splits | Yes | Each model is trained by five-fold cross validation. For the clustering models, we apply the clustering purity to evaluate their performance. For the classification and regression models, we apply the classification accuracy and MAE (Willmott and Matsuura 2005) to evaluate them, respectively. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | No | The paper describes the model architecture (two-layer multi-layer perceptrons) and mentions using K-means for clustering and five-fold cross-validation. However, it does not provide specific hyperparameters such as learning rate, batch size, number of epochs, or optimizer settings in the main text. |