MogaNet: Multi-order Gated Aggregation Network
Authors: Siyuan Li, Zedong Wang, Zicheng Liu, Cheng Tan, Haitao Lin, Di Wu, Zhiyuan Chen, Jiangbin Zheng, Stan Z. Li
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Moga Net exhibits great scalability, impressive efficiency of parameters, and competitive performance compared to state-of-the-art Vi Ts and Conv Nets on Image Net and various downstream vision benchmarks, including COCO object detection, ADE20K semantic segmentation, 2D&3D human pose estimation, and video prediction. Notably, Moga Net hits 80.0% and 87.8% accuracy with 5.2M and 181M parameters on Image Net-1K, outperforming Par C-Net and Conv Ne Xt L, while saving 59% FLOPs and 17M parameters, respectively. |
| Researcher Affiliation | Academia | 1AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, China 2Zhejiang University, College of Computer Science and Technology, Hangzhou, China |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. Figure 4 shows architectural diagrams, not algorithmic steps. |
| Open Source Code | Yes | The source code is available at https://github.com/Westlake-AI/Moga Net. |
| Open Datasets | Yes | To impartially evaluate and compare Moga Net with the leading network architectures, we conduct extensive experiments across various popular vision tasks, including image classification, object detection, instance and semantic segmentation, 2D and 3D pose estimation, and video prediction. |
| Dataset Splits | Yes | For classification experiments on Image Net (Deng et al., 2009), we train our Moga Net following the standard procedure (Touvron et al., 2021a; Liu et al., 2021) on Image Net-1K (IN-1K) for a fair comparison, training 300 epochs with Adam W (Loshchilov & Hutter, 2019) optimizer, a basic learning rate of 1 10 3, and a cosine scheduler (Loshchilov & Hutter, 2016). |
| Hardware Specification | Yes | The experiments are implemented with Py Torch and run on NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions software like PyTorch, Open Mixup, and timm but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | We train all Moga Net models for 300 epochs by Adam W (Loshchilov & Hutter, 2019) optimizer using a batch size of 1024, a basic learning rate of 1 10 3, a weight decay of 0.05, and a Cosine learning rate scheduler (Loshchilov & Hutter, 2016) with 5 epochs of linear warmup (Devlin et al., 2018). |