MogaNet: Multi-order Gated Aggregation Network

Authors: Siyuan Li, Zedong Wang, Zicheng Liu, Cheng Tan, Haitao Lin, Di Wu, Zhiyuan Chen, Jiangbin Zheng, Stan Z. Li

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Moga Net exhibits great scalability, impressive efficiency of parameters, and competitive performance compared to state-of-the-art Vi Ts and Conv Nets on Image Net and various downstream vision benchmarks, including COCO object detection, ADE20K semantic segmentation, 2D&3D human pose estimation, and video prediction. Notably, Moga Net hits 80.0% and 87.8% accuracy with 5.2M and 181M parameters on Image Net-1K, outperforming Par C-Net and Conv Ne Xt L, while saving 59% FLOPs and 17M parameters, respectively.
Researcher Affiliation Academia 1AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, China 2Zhejiang University, College of Computer Science and Technology, Hangzhou, China
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. Figure 4 shows architectural diagrams, not algorithmic steps.
Open Source Code Yes The source code is available at https://github.com/Westlake-AI/Moga Net.
Open Datasets Yes To impartially evaluate and compare Moga Net with the leading network architectures, we conduct extensive experiments across various popular vision tasks, including image classification, object detection, instance and semantic segmentation, 2D and 3D pose estimation, and video prediction.
Dataset Splits Yes For classification experiments on Image Net (Deng et al., 2009), we train our Moga Net following the standard procedure (Touvron et al., 2021a; Liu et al., 2021) on Image Net-1K (IN-1K) for a fair comparison, training 300 epochs with Adam W (Loshchilov & Hutter, 2019) optimizer, a basic learning rate of 1 10 3, and a cosine scheduler (Loshchilov & Hutter, 2016).
Hardware Specification Yes The experiments are implemented with Py Torch and run on NVIDIA A100 GPUs.
Software Dependencies No The paper mentions software like PyTorch, Open Mixup, and timm but does not provide specific version numbers for these dependencies.
Experiment Setup Yes We train all Moga Net models for 300 epochs by Adam W (Loshchilov & Hutter, 2019) optimizer using a batch size of 1024, a basic learning rate of 1 10 3, a weight decay of 0.05, and a Cosine learning rate scheduler (Loshchilov & Hutter, 2016) with 5 epochs of linear warmup (Devlin et al., 2018).