reproducibilityindex.ai

Achieving Cross Modal Generalization with Multimodal Unified Representation

Authors: Yan Xia, Hai Huang, Jieming Zhu, Zhou Zhao

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on various downstream tasks, i.e., cross-modal event classification, localization, cross-modal retrieval, query-based video segmentation, and cross-dataset event localization, demonstrate the effectiveness of our proposed methods.
Researcher Affiliation	Collaboration	1Zhejiang University 2Shanghai Artificial Intelligence Laboratory 3Huawei Noah s Ark Lab
Pseudocode	No	The paper contains a network overview figure and mathematical equations but no structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at https://github.com/haihuangcode/CMG.
Open Datasets	Yes	We use VGGsound-AVEL [44, 45] to pre-train our unified representation, and divide it into several different sizes: 24K, 40K, 81K. and Cross-modal event classification (AVE [46]):... Cross-modal event localization (AVVP [47]):... Cross-modal video segmentation (AVSBench-S4 [48]):...
Dataset Splits	No	The paper mentions using VGGsound-AVEL (24K, 40K, 81K) for pre-training and AVE, AVVP, AVSBench-S4 for downstream tasks, but does not explicitly state the train/validation/test splits (percentages or counts) for any of these datasets, nor does it cite a source for predefined splits.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory, or specific computing environments used for running experiments.
Software Dependencies	No	The paper mentions using Mind Spore but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	where β is 0.25 for all our experiments... (50% in our setting, β is the same as in Eq 3). and The implementation details are provided in Appendix.