Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Achieving Cross Modal Generalization with Multimodal Unified Representation
Authors: Yan Xia, Hai Huang, Jieming Zhu, Zhou Zhao
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on various downstream tasks, i.e., cross-modal event classification, localization, cross-modal retrieval, query-based video segmentation, and cross-dataset event localization, demonstrate the effectiveness of our proposed methods. |
| Researcher Affiliation | Collaboration | 1Zhejiang University 2Shanghai Artificial Intelligence Laboratory 3Huawei Noah s Ark Lab |
| Pseudocode | No | The paper contains a network overview figure and mathematical equations but no structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/haihuangcode/CMG. |
| Open Datasets | Yes | We use VGGsound-AVEL [44, 45] to pre-train our unified representation, and divide it into several different sizes: 24K, 40K, 81K. and Cross-modal event classification (AVE [46]):... Cross-modal event localization (AVVP [47]):... Cross-modal video segmentation (AVSBench-S4 [48]):... |
| Dataset Splits | No | The paper mentions using VGGsound-AVEL (24K, 40K, 81K) for pre-training and AVE, AVVP, AVSBench-S4 for downstream tasks, but does not explicitly state the train/validation/test splits (percentages or counts) for any of these datasets, nor does it cite a source for predefined splits. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or specific computing environments used for running experiments. |
| Software Dependencies | No | The paper mentions using Mind Spore but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | where β is 0.25 for all our experiments... (50% in our setting, β is the same as in Eq 3). and The implementation details are provided in Appendix. |