Efficient Deweahter Mixture-of-Experts with Uncertainty-Aware Feature-Wise Linear Modulation
Authors: Rongyu Zhang, Yulin Luo, Jiaming Liu, Huanrui Yang, Zhen Dong, Denis Gudovskiy, Tomoyuki Okuno, Yohei Nakata, Kurt Keutzer, Yuan Du, Shanghang Zhang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The conducted experiments on the multi-deweather task show that our Mo FME outperforms the state-of-the-art in the image restoration quality by 0.1-0.2 d B while saving more than 74% of parameters and 20% inference time over the conventional Mo E counterpart. Experiments on the downstream segmentation and classification tasks further demonstrate the generalizability of Mo FME to real open-world applications. |
| Researcher Affiliation | Collaboration | Nanjing University 2National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University 3University of California, Berkeley 4Panasonic |
| Pseudocode | No | The paper includes schematic illustrations of its network architecture (Figure 3) but does not provide any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code, nor does it include a link to a code repository. |
| Open Datasets | Yes | All-weather (Valanarasu et al. 2022) and Rain/Haze Cityscapes (Hu et al. 2019; Sakaridis, Dai, and Van Gool 2018) datasets are used to evaluate deweathering and downstream segmentation. The CIFAR10 dataset is for the downstream image classification task. |
| Dataset Splits | No | Input images are randomly cropped to 256 256 size for training, and non-overlap crops of the same size are used at test time. We randomly flip and rotate images for data augmentation. The paper does not explicitly state specific training, validation, and testing dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | Yes | We implement our method with the Py Torch framework using 4 NVIDIA A100 GPUs. |
| Software Dependencies | No | We implement our method with the Py Torch framework using 4 NVIDIA A100 GPUs. The paper mentions PyTorch but does not specify a version number or other software dependencies with their respective versions. |
| Experiment Setup | Yes | We train the network for 200 epochs with a batch size of 64. The initial learning rate of the Adam W optimizer and Cosine LR scheduler is set to 0.5 10 4 and is gradually reduced to 10 6. We use a warm-up stage with three epochs. Input images are randomly cropped to 256 256 size for training, and non-overlap crops of the same size are used at test time. We randomly flip and rotate images for data augmentation. |