Deep Multimodal Fusion by Channel Exchanging
Authors: Yikai Wang, Wenbing Huang, Fuchun Sun, Tingyang Xu, Yu Rong, Junzhou Huang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on semantic segmentation via RGB-D data and image translation through multi-domain input verify the effectiveness of our CEN compared to current state-of-the-art methods. Detailed ablation studies have also been carried out, which provably affirm the advantage of each component we propose. |
| Researcher Affiliation | Collaboration | Yikai Wang1, Wenbing Huang1, Fuchun Sun1 , Tingyang Xu2, Yu Rong2, Junzhou Huang2 1Beijing National Research Center for Information Science and Technology (BNRist), State Key Lab on Intelligent Technology and Systems, Department of Computer Science and Technology, Tsinghua University 2Tencent AI Lab |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/yikaiw/CEN. |
| Open Datasets | Yes | We evaluate our method on two public datasets NYUDv2 [40] and SUN RGB-D [42] |
| Dataset Splits | Yes | Regarding NYUDv2, we follow the standard settings and adopt the split of 795 images for training and 654 for testing, with predicting standard 40 classes [16]. ... We use the public train-test split (5,285 vs 5,050). ... For efficiency, we sample 1,000 high-quality multimodal images for training, and 500 for validation. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify its version number or any other software dependencies with version numbers. |
| Experiment Setup | Yes | The initial learning rates are set to 5 10 4 and 3 10 3 for the encoder and decoder, respectively, both of which are reduced to their halves every 100/150 epochs (total epochs 300/450) on NYUDv2 with Res Net101/Res Net152 and every 20 epochs (total epochs 60) on SUN RGB-D. The mini-batch size, momentum and weight decay are selected as 6, 0.9, and 10 5, respectively, on both datasets. We set λ = 5 10 3 in Equation 4 and the threshold to θ = 2 10 2 in Equation 6. |