Achieving Cross Modal Generalization with Multimodal Unified Representation
Authors: Yan Xia, Hai Huang, Jieming Zhu, Zhou Zhao
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on various downstream tasks, i.e., cross-modal event classification, localization, cross-modal retrieval, query-based video segmentation, and cross-dataset event localization, demonstrate the effectiveness of our proposed methods. |
| Researcher Affiliation | Collaboration | 1Zhejiang University 2Shanghai Artificial Intelligence Laboratory 3Huawei Noah s Ark Lab |
| Pseudocode | No | The paper contains a network overview figure and mathematical equations but no structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/haihuangcode/CMG. |
| Open Datasets | Yes | We use VGGsound-AVEL [44, 45] to pre-train our unified representation, and divide it into several different sizes: 24K, 40K, 81K. and Cross-modal event classification (AVE [46]):... Cross-modal event localization (AVVP [47]):... Cross-modal video segmentation (AVSBench-S4 [48]):... |
| Dataset Splits | No | The paper mentions using VGGsound-AVEL (24K, 40K, 81K) for pre-training and AVE, AVVP, AVSBench-S4 for downstream tasks, but does not explicitly state the train/validation/test splits (percentages or counts) for any of these datasets, nor does it cite a source for predefined splits. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or specific computing environments used for running experiments. |
| Software Dependencies | No | The paper mentions using Mind Spore but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | where β is 0.25 for all our experiments... (50% in our setting, β is the same as in Eq 3). and The implementation details are provided in Appendix. |