HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

Authors: Yongming Rao, Wenliang Zhao, Yansong Tang, Jie Zhou, Ser Nam Lim, Jiwen Lu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on Image Net classification, COCO object detection and ADE20K semantic segmentation show Hor Net outperform Swin Transformers and Conv Ne Xt by a significant margin with similar overall architecture and training configurations.
Researcher Affiliation Collaboration Yongming Rao1 Wenliang Zhao1 Yansong Tang1 Jie Zhou1 Ser-Nam Lim2 Jiwen Lu1 1Tsinghua University 2Meta AI
Pseudocode Yes Figure 2: Overview of the basic building block in Hor Net with gn Conv. We also provide the detailed implementation of g3Conv (middle) and the Pytorch-style code for an arbitrary order (right).
Open Source Code Yes Code is available at https://github.com/raoyongming/Hor Net.
Open Datasets Yes We conduct extensive experiments to verify the effectiveness of our method. We present the main results on Image Net [13] and compare them with various architectures. We also test our models on the downstream dense prediction tasks on commonly used semantic segmentation benchmark ADE20K [71] and object detection dataset COCO [38].
Dataset Splits Yes We train our Hor Net-T/S/B models using the standard Image Net-1K dataset following common practice. To evaluate the scaling ability of our designs, we further train the Hor Net-L models on the Image Net-22K dataset that contains over 10 images and more categories.
Hardware Specification Yes The latency is measured with a single NVIDIA RTX 3090 GPU with a batch size of 128.
Software Dependencies No The paper mentions 'Py Torch-style code' but does not specify version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes We train the models for 300 epochs with 224 224 input. We train the models for 90 epochs and use a similar data augmentation strategy as Image Net-1K experiments. All the models are trained for 160k iterations using Adam W [44] optimizer with a global batch size of 16. The image size during training is 512 512 for Imag Net-1k (Hor Net-T/S/B) pre-trained models and 640 640 for the Image Net-22K pre-trained models (Hor Net-L).