GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation

Authors: Chenhongyi Yang, Jiarui Xu, Shalini De Mello, Elliot J. Crowley, Xiaolong Wang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on multiple visual recognition tasks including image classification, object detection, instance segmentation, and semantic segmentation.
Researcher Affiliation Collaboration Chenhongyi Yang 1 Jiarui Xu 2 Shalini De Mello3 Elliot J. Crowley1 Xiaolong Wang2 1School of Engineering, University of Edinburgh 2UC San Diego 3NVIDIA
Pseudocode No No pseudocode or clearly labeled algorithm block was found.
Open Source Code Yes Code and pre-trained models are available at https://github.com/Chenhongyi Yang/GPVi T.
Open Datasets Yes We conduct experiments on multiple visual recognition tasks including image classification, object detection, instance segmentation, and semantic segmentation. Setting: To ensure a fair comparison with previous work, we largely follow the training recipe of Swin Transformer (Liu et al., 2021). We build models using the MMClassification (Contributors, 2020a) toolkit. The models are trained for 300 epochs with a batch size of 2048 using the Adam W optimizer with a weight decay of 0.05 and a peak learning rate of 0.002. A cosine learning rate schedule is used to gradually decrease the learning rate. We use the data augmentations from Liu et al. (2021); these include Mixup (Zhang et al., 2017), Cutmix (Yun et al., 2019), Random erasing (Zhong et al., 2020) and Rand augment (Cubuk et al., 2020).
Dataset Splits No The paper refers to using datasets like ImageNet-1K, MS COCO mini-val, and ADE20K, but does not explicitly provide the training/test/validation split percentages or sample counts to reproduce the data partitioning. It mentions using 'mini-val' for COCO, implying a validation set, but its specific size or proportion relative to train/test is not detailed. For ImageNet and ADE20K, no split information is given in the text.
Hardware Specification Yes The results are evaluated on NVIDIA 2080Ti GPUs.
Software Dependencies No The paper mentions using toolkits like MMClassification, MMDetection, and MMSegmentation but does not specify their version numbers or the versions of other core software dependencies (e.g., Python, PyTorch, CUDA).
Experiment Setup Yes The models are trained for 300 epochs with a batch size of 2048 using the Adam W optimizer with a weight decay of 0.05 and a peak learning rate of 0.002. A cosine learning rate schedule is used to gradually decrease the learning rate.