Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

An Optimal Transport-based Latent Mixer for Robust Multi-modal Learning

Authors: Fengjiiao Gong, Angxiao Yue, Hongteng Xu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on multi-modal clustering and classification demonstrate that the models learned with the OTM method outperform the corresponding baselines. Experiments on multi-modal clustering and classification demonstrate the effectiveness of our method compared with the existing baselines, especially in the unaligned multi-modal scenarios.
Researcher Affiliation Academia 1Gaoling School of Artifical Intelligence, Renmin University of China, Beijing, China 2Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China
Pseudocode Yes Algorithm 1: Computation of FGW distance Algorithm 2: Computation of FGW barycenter
Open Source Code Yes The code and more experimental results can be found at https://github.com/red Linmumu/OTM.
Open Datasets Yes For clustering tasks, we conduct the experiments on four conventional multi-modal dataset used in (Hu, Nie, and Li 2019; Guo et al. 2014; Gong, Nie, and Xu 2022). Each dataset contains well-aligned samples and corresponding labels, which are used only in the validation stage. The dataset for the classification and regression tasks are chosen from Multibench (Liang et al. 2021), which is a wellknown systematic large-scale multi-modal learning benchmark.
Dataset Splits Yes Each model is trained by five-fold cross validation. For the clustering models, we apply the clustering purity to evaluate their performance. For the classification and regression models, we apply the classification accuracy and MAE (Willmott and Matsuura 2005) to evaluate them, respectively.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup No The paper describes the model architecture (two-layer multi-layer perceptrons) and mentions using K-means for clustering and five-fold cross-validation. However, it does not provide specific hyperparameters such as learning rate, batch size, number of epochs, or optimizer settings in the main text.