HRFormer: High-Resolution Vision Transformer for Dense Predict

Authors: YUHUI YUAN, Rao Fu, Lang Huang, Weihong Lin, Chao Zhang, Xilin Chen, Jingdong Wang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on image classification, pose estimation, and semantic segmentation tasks, and achieve competitive performance on various benchmarks. For example, HRFormer-B gains +1.0% top-1 accuracy on Image Net classification over Dei T-B [42] with 40% fewer parameters and 20% fewer FLOPs. HRFormer-B gains 0.9% AP over HRNet-W48 [41] on COCO val set with with 32% fewer parameters and 19% fewer FLOPs. HRFormer-B + OCR gains +1.2% and +2.0% m Io U over HRNet-W48 + OCR [55] with 25% fewer parameters and slightly more FLOPs on PASCAL-Context test and COCO-Stuff test, respectively.
Researcher Affiliation Collaboration 1University of Chinese Academy of Sciences 2Institute of Computing Technology, CAS 3Peking University 4Microsoft Research Asia 5Baidu
Pseudocode No The paper includes diagrams illustrating the HRFormer block (Figure 1) and architecture (Figure 2), but it does not provide any structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at: https://github.com/HRNet/HRFormer.
Open Datasets Yes We train our model on COCO train 2017 dataset, including 57K images and 150K person instances. We evaluate our approach on the val 2017 set and test-dev 2017, containing 5K images and 20K images, respectively.
Dataset Splits Yes We train our model on COCO train 2017 dataset, including 57K images and 150K person instances. We evaluate our approach on the val 2017 set and test-dev 2017, containing 5K images and 20K images, respectively.
Hardware Specification Yes Each HRFormer experiment on COCO pose estimation task takes 8 32G-V100 GPUs. Each HRFormer + OCR experiment on Cityscapes takes 8 32G-V100 GPUs. HRFormer-T and HRFormer-S require 8 32G-V100 GPUs and HRFormer-B requires 32 32G-V100 GPUs.
Software Dependencies No The paper mentions 'mmpose [8]' and 'Adam W' as part of the training settings, but does not provide specific version numbers for these or any other software libraries or frameworks (e.g., PyTorch, TensorFlow, Python version) used for the experiments.
Experiment Setup Yes We set the initial learning rate as 0.0001, weight decay as 0.01, crop size as 1024 512, batch size as 8, and training iterations as 80K by default.