Deep Multimodal Fusion by Channel Exchanging

Authors: Yikai Wang, Wenbing Huang, Fuchun Sun, Tingyang Xu, Yu Rong, Junzhou Huang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on semantic segmentation via RGB-D data and image translation through multi-domain input verify the effectiveness of our CEN compared to current state-of-the-art methods. Detailed ablation studies have also been carried out, which provably affirm the advantage of each component we propose.
Researcher Affiliation Collaboration Yikai Wang1, Wenbing Huang1, Fuchun Sun1 , Tingyang Xu2, Yu Rong2, Junzhou Huang2 1Beijing National Research Center for Information Science and Technology (BNRist), State Key Lab on Intelligent Technology and Systems, Department of Computer Science and Technology, Tsinghua University 2Tencent AI Lab
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/yikaiw/CEN.
Open Datasets Yes We evaluate our method on two public datasets NYUDv2 [40] and SUN RGB-D [42]
Dataset Splits Yes Regarding NYUDv2, we follow the standard settings and adopt the split of 795 images for training and 654 for testing, with predicting standard 40 classes [16]. ... We use the public train-test split (5,285 vs 5,050). ... For efficiency, we sample 1,000 high-quality multimodal images for training, and 500 for validation.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions 'Py Torch' but does not specify its version number or any other software dependencies with version numbers.
Experiment Setup Yes The initial learning rates are set to 5 10 4 and 3 10 3 for the encoder and decoder, respectively, both of which are reduced to their halves every 100/150 epochs (total epochs 300/450) on NYUDv2 with Res Net101/Res Net152 and every 20 epochs (total epochs 60) on SUN RGB-D. The mini-batch size, momentum and weight decay are selected as 6, 0.9, and 10 5, respectively, on both datasets. We set λ = 5 10 3 in Equation 4 and the threshold to θ = 2 10 2 in Equation 6.