Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MANGO: Multimodal Attention-based Normalizing Flow Approach to Fusion Learning

Authors: Thanh-Dat Truong, Christophe Bobda, Nitin Agarwal, Khoa Luu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results on three different multimodal learning tasks, i.e., semantic segmentation, image-to-image translation, and movie genre classification, have illustrated the state-of-the-art (So TA) performance of the proposed approach.
Researcher Affiliation Academia Thanh-Dat Truong1, Christophe Bobda2, Nitin Agarwal3,4, Khoa Luu1 1CVIU Lab, University of Arkansas, USA 2University of Florida, USA 3COSMOS Research Center, University of Arkansas, Little Rock, USA 4ICSI, University of California, Berkeley, USA EMAIL EMAIL, EMAIL
Pseudocode No The paper describes methods using mathematical equations and descriptive text, such as in Section 3.1 describing the Invertible Cross-Attention (ICA) layer with Eqn. (3), (4), and (5), but does not present any content in a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No The NeurIPS Paper Checklist states: 'Answer: [NA] Justification: The code will be published may the paper be accepted.'
Open Datasets Yes Semantic Segmentation. This task uses the two homogeneous inputs of RGB and Depth images to predict the segmentation maps. We perform experiments on NYUDv2 [36] and SUN RGB-D [45]. ... Image-to-Image Translation. Following the standard protocol in [65], we adopt the Taskonomy [75] for the multimodal image translation task. ... MM-IMDB Movie Genre Classification. MM-IMDB is a large-scale multimodal dataset for movie genre classification. We adopt the training and testing split of [70] for fair comparisons.
Dataset Splits Yes NYUDv2 consists of 795/654 images for training and testing splits, SUN RGB-D includes 5,285/5,050 samples for training and testing. ... We use a subset of 1,000 high-quality images for training and 500 for validation. ... the data in our experiments consists of 15,552 data for training and 2,608 for validation.
Hardware Specification Yes Our experiments are conducted on the 4 NVIDIA A100 GPUS.
Software Dependencies No The paper does not explicitly state specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) needed to replicate the experiments.
Experiment Setup Yes Our bijective network G consists of L = 12 cross-attention blocks. ... Our training uses the same learning hyper-parameters from [65] and an input image size of 256 256 for fair comparisons.