HRFormer: High-Resolution Vision Transformer for Dense Predict
Authors: YUHUI YUAN, Rao Fu, Lang Huang, Weihong Lin, Chao Zhang, Xilin Chen, Jingdong Wang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on image classification, pose estimation, and semantic segmentation tasks, and achieve competitive performance on various benchmarks. For example, HRFormer-B gains +1.0% top-1 accuracy on Image Net classification over Dei T-B [42] with 40% fewer parameters and 20% fewer FLOPs. HRFormer-B gains 0.9% AP over HRNet-W48 [41] on COCO val set with with 32% fewer parameters and 19% fewer FLOPs. HRFormer-B + OCR gains +1.2% and +2.0% m Io U over HRNet-W48 + OCR [55] with 25% fewer parameters and slightly more FLOPs on PASCAL-Context test and COCO-Stuff test, respectively. |
| Researcher Affiliation | Collaboration | 1University of Chinese Academy of Sciences 2Institute of Computing Technology, CAS 3Peking University 4Microsoft Research Asia 5Baidu |
| Pseudocode | No | The paper includes diagrams illustrating the HRFormer block (Figure 1) and architecture (Figure 2), but it does not provide any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at: https://github.com/HRNet/HRFormer. |
| Open Datasets | Yes | We train our model on COCO train 2017 dataset, including 57K images and 150K person instances. We evaluate our approach on the val 2017 set and test-dev 2017, containing 5K images and 20K images, respectively. |
| Dataset Splits | Yes | We train our model on COCO train 2017 dataset, including 57K images and 150K person instances. We evaluate our approach on the val 2017 set and test-dev 2017, containing 5K images and 20K images, respectively. |
| Hardware Specification | Yes | Each HRFormer experiment on COCO pose estimation task takes 8 32G-V100 GPUs. Each HRFormer + OCR experiment on Cityscapes takes 8 32G-V100 GPUs. HRFormer-T and HRFormer-S require 8 32G-V100 GPUs and HRFormer-B requires 32 32G-V100 GPUs. |
| Software Dependencies | No | The paper mentions 'mmpose [8]' and 'Adam W' as part of the training settings, but does not provide specific version numbers for these or any other software libraries or frameworks (e.g., PyTorch, TensorFlow, Python version) used for the experiments. |
| Experiment Setup | Yes | We set the initial learning rate as 0.0001, weight decay as 0.01, crop size as 1024 512, batch size as 8, and training iterations as 80K by default. |