Efficient Deweahter Mixture-of-Experts with Uncertainty-Aware Feature-Wise Linear Modulation

Authors: Rongyu Zhang, Yulin Luo, Jiaming Liu, Huanrui Yang, Zhen Dong, Denis Gudovskiy, Tomoyuki Okuno, Yohei Nakata, Kurt Keutzer, Yuan Du, Shanghang Zhang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The conducted experiments on the multi-deweather task show that our Mo FME outperforms the state-of-the-art in the image restoration quality by 0.1-0.2 d B while saving more than 74% of parameters and 20% inference time over the conventional Mo E counterpart. Experiments on the downstream segmentation and classification tasks further demonstrate the generalizability of Mo FME to real open-world applications.
Researcher Affiliation Collaboration Nanjing University 2National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University 3University of California, Berkeley 4Panasonic
Pseudocode No The paper includes schematic illustrations of its network architecture (Figure 3) but does not provide any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about releasing its source code, nor does it include a link to a code repository.
Open Datasets Yes All-weather (Valanarasu et al. 2022) and Rain/Haze Cityscapes (Hu et al. 2019; Sakaridis, Dai, and Van Gool 2018) datasets are used to evaluate deweathering and downstream segmentation. The CIFAR10 dataset is for the downstream image classification task.
Dataset Splits No Input images are randomly cropped to 256 256 size for training, and non-overlap crops of the same size are used at test time. We randomly flip and rotate images for data augmentation. The paper does not explicitly state specific training, validation, and testing dataset splits (e.g., percentages or sample counts).
Hardware Specification Yes We implement our method with the Py Torch framework using 4 NVIDIA A100 GPUs.
Software Dependencies No We implement our method with the Py Torch framework using 4 NVIDIA A100 GPUs. The paper mentions PyTorch but does not specify a version number or other software dependencies with their respective versions.
Experiment Setup Yes We train the network for 200 epochs with a batch size of 64. The initial learning rate of the Adam W optimizer and Cosine LR scheduler is set to 0.5 10 4 and is gradually reduced to 10 6. We use a warm-up stage with three epochs. Input images are randomly cropped to 256 256 size for training, and non-overlap crops of the same size are used at test time. We randomly flip and rotate images for data augmentation.