reproducibilityindex.ai

GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation

Authors: Chenhongyi Yang, Jiarui Xu, Shalini De Mello, Elliot J. Crowley, Xiaolong Wang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on multiple visual recognition tasks including image classification, object detection, instance segmentation, and semantic segmentation.
Researcher Affiliation	Collaboration	Chenhongyi Yang 1 Jiarui Xu 2 Shalini De Mello3 Elliot J. Crowley1 Xiaolong Wang2 1School of Engineering, University of Edinburgh 2UC San Diego 3NVIDIA
Pseudocode	No	No pseudocode or clearly labeled algorithm block was found.
Open Source Code	Yes	Code and pre-trained models are available at https://github.com/Chenhongyi Yang/GPVi T.
Open Datasets	Yes	We conduct experiments on multiple visual recognition tasks including image classification, object detection, instance segmentation, and semantic segmentation. Setting: To ensure a fair comparison with previous work, we largely follow the training recipe of Swin Transformer (Liu et al., 2021). We build models using the MMClassification (Contributors, 2020a) toolkit. The models are trained for 300 epochs with a batch size of 2048 using the Adam W optimizer with a weight decay of 0.05 and a peak learning rate of 0.002. A cosine learning rate schedule is used to gradually decrease the learning rate. We use the data augmentations from Liu et al. (2021); these include Mixup (Zhang et al., 2017), Cutmix (Yun et al., 2019), Random erasing (Zhong et al., 2020) and Rand augment (Cubuk et al., 2020).
Dataset Splits	No	The paper refers to using datasets like ImageNet-1K, MS COCO mini-val, and ADE20K, but does not explicitly provide the training/test/validation split percentages or sample counts to reproduce the data partitioning. It mentions using 'mini-val' for COCO, implying a validation set, but its specific size or proportion relative to train/test is not detailed. For ImageNet and ADE20K, no split information is given in the text.
Hardware Specification	Yes	The results are evaluated on NVIDIA 2080Ti GPUs.
Software Dependencies	No	The paper mentions using toolkits like MMClassification, MMDetection, and MMSegmentation but does not specify their version numbers or the versions of other core software dependencies (e.g., Python, PyTorch, CUDA).
Experiment Setup	Yes	The models are trained for 300 epochs with a batch size of 2048 using the Adam W optimizer with a weight decay of 0.05 and a peak learning rate of 0.002. A cosine learning rate schedule is used to gradually decrease the learning rate.