Glance-and-Gaze Vision Transformer

Authors: Qihang Yu, Yingda Xia, Yutong Bai, Yongyi Lu, Alan L. Yuille, Wei Shen

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate our method achieves consistently superior performance over previous state-of-the-art Transformers on various vision tasks and benchmarks. and section titles 4 Experiments, 4.1 Image Net Classification, 4.2 ADE20K Semantic Segmentation, 4.3 COCO Object Detection, 4.4 Ablation Studies.
Researcher Affiliation Academia 1 Department of Computer Science, Johns Hopkins University 2 Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University
Pseudocode No No structured pseudocode or algorithm blocks (e.g., labeled 'Algorithm' or 'Pseudocode') were found in the paper.
Open Source Code No The paper does not provide a specific link to source code for the described methodology nor an explicit statement about its release.
Open Datasets Yes Image Net [10] classification, COCO [23] object detection, and ADE20K [48] semantic segmentation
Dataset Splits Yes Image Net-1K [10] classification task, which contains 1.28M training images and 50K validation images for 1000 classes. and ADE20K [48] is a challenging semantic segmentation dataset, containing 20K images for training and 2K images for validation. and COCO dataset [23], which contains 118K, 5K, 20K images for training, validation and test respectively.
Hardware Specification Yes The evaluation is done with a single Nvidia tesla v100-sxm2-16gb GPU.
Software Dependencies No The paper mentions using MMSegmentation [8] and MMDetection [4] and Adam W [25] optimizer, but does not provide specific version numbers for these software components.
Experiment Setup Yes For ImageNet: We use Adam W [25] optimizer for 300 epochs with cosine learning rate decay including 20 epochs for linear warm-up. The training batch size is 1024 with 8 GPUs. Initial learning rate starts at 0.001, and weight decay is 0.05. For ADE20K: We use Adam W [25] with a learning rate starting at 6 10 5, weight decay of 0.01, batch size of 16, crop size of 512 512. The learning rate schedule contains a warmup of 1500 iterations and linear learning rate decay. The training is conducted with 8 GPUs and the training procedure lasts for 160K iterations in total.