Cross-modal Representation Flattening for Multi-modal Domain Generalization
Authors: Yunfeng FAN, Wenchao Xu, Haozhao Wang, Song Guo
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are performed on two benchmark datasets, EPIC-Kitchens and Human Animal-Cartoon (HAC), with various modality combinations, demonstrating the effectiveness of our method under multi-source and single-source settings. |
| Researcher Affiliation | Academia | Yunfeng Fan1, Wenchao Xu1, , Haohao Wang2, Song Guo3 1Department of Computing, The Hong Kong Polytechnic University, 2School of Computer Science and Technology, Huazhong University of Science and Technology, 3Hong Kong University of Science and Technology |
| Pseudocode | No | The paper does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor structured steps formatted like code. |
| Open Source Code | Yes | Our code is open-sourced 1. 1https://github.com/fanyunfeng-bit/Cross-modal-Representation-Flattening-for-MMDG |
| Open Datasets | Yes | We utilize two benchmark datasets, EPIC-Kitchens [40] and Human-Animal-Cartoon (HAC) [28] |
| Dataset Splits | Yes | For all methods, we follow [41] and select the model with best validation (in-domain) accuracy to evaluate generalization on test (out-of-domain) data. |
| Hardware Specification | Yes | All experiments were conducted on an NVIDIA Ge Force RTX 3090 GPU with a 3.9-GHz Intel Core i9-12900K CPU. |
| Software Dependencies | No | The paper mentions 'MMAction2 toolkit [44]' and 'Adam optimizer [49]' but does not provide specific version numbers for these or other key software components. |
| Experiment Setup | Yes | The dimensions of the uni-modal feature h are 2304 for video, 512 for audio, and 2048 for optical flow. For the projector Projk ( ), we implement a multi-layer perceptron with two hidden layers of size 2048 and output size 128. We use the Adam optimizer [49] with a learning rate of 0.0001 and a batch size of 16. The scalar temperature parameter τ is set to 0.1. Additionally, we set λ1 = 2.0, λ2 = λ3 = 3.0, α in the Beta distribution to 0.1, and the SMA start iteration t0 to 400 for EPIC-Kitchens and 100 for HAC respectively. The model is trained with 15 epochs, taking two hours. |