HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions
Authors: Yongming Rao, Wenliang Zhao, Yansong Tang, Jie Zhou, Ser Nam Lim, Jiwen Lu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on Image Net classification, COCO object detection and ADE20K semantic segmentation show Hor Net outperform Swin Transformers and Conv Ne Xt by a significant margin with similar overall architecture and training configurations. |
| Researcher Affiliation | Collaboration | Yongming Rao1 Wenliang Zhao1 Yansong Tang1 Jie Zhou1 Ser-Nam Lim2 Jiwen Lu1 1Tsinghua University 2Meta AI |
| Pseudocode | Yes | Figure 2: Overview of the basic building block in Hor Net with gn Conv. We also provide the detailed implementation of g3Conv (middle) and the Pytorch-style code for an arbitrary order (right). |
| Open Source Code | Yes | Code is available at https://github.com/raoyongming/Hor Net. |
| Open Datasets | Yes | We conduct extensive experiments to verify the effectiveness of our method. We present the main results on Image Net [13] and compare them with various architectures. We also test our models on the downstream dense prediction tasks on commonly used semantic segmentation benchmark ADE20K [71] and object detection dataset COCO [38]. |
| Dataset Splits | Yes | We train our Hor Net-T/S/B models using the standard Image Net-1K dataset following common practice. To evaluate the scaling ability of our designs, we further train the Hor Net-L models on the Image Net-22K dataset that contains over 10 images and more categories. |
| Hardware Specification | Yes | The latency is measured with a single NVIDIA RTX 3090 GPU with a batch size of 128. |
| Software Dependencies | No | The paper mentions 'Py Torch-style code' but does not specify version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We train the models for 300 epochs with 224 224 input. We train the models for 90 epochs and use a similar data augmentation strategy as Image Net-1K experiments. All the models are trained for 160k iterations using Adam W [44] optimizer with a global batch size of 16. The image size during training is 512 512 for Imag Net-1k (Hor Net-T/S/B) pre-trained models and 640 640 for the Image Net-22K pre-trained models (Hor Net-L). |