Glance-and-Gaze Vision Transformer
Authors: Qihang Yu, Yingda Xia, Yutong Bai, Yongyi Lu, Alan L. Yuille, Wei Shen
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate our method achieves consistently superior performance over previous state-of-the-art Transformers on various vision tasks and benchmarks. and section titles 4 Experiments, 4.1 Image Net Classification, 4.2 ADE20K Semantic Segmentation, 4.3 COCO Object Detection, 4.4 Ablation Studies. |
| Researcher Affiliation | Academia | 1 Department of Computer Science, Johns Hopkins University 2 Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University |
| Pseudocode | No | No structured pseudocode or algorithm blocks (e.g., labeled 'Algorithm' or 'Pseudocode') were found in the paper. |
| Open Source Code | No | The paper does not provide a specific link to source code for the described methodology nor an explicit statement about its release. |
| Open Datasets | Yes | Image Net [10] classification, COCO [23] object detection, and ADE20K [48] semantic segmentation |
| Dataset Splits | Yes | Image Net-1K [10] classification task, which contains 1.28M training images and 50K validation images for 1000 classes. and ADE20K [48] is a challenging semantic segmentation dataset, containing 20K images for training and 2K images for validation. and COCO dataset [23], which contains 118K, 5K, 20K images for training, validation and test respectively. |
| Hardware Specification | Yes | The evaluation is done with a single Nvidia tesla v100-sxm2-16gb GPU. |
| Software Dependencies | No | The paper mentions using MMSegmentation [8] and MMDetection [4] and Adam W [25] optimizer, but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | For ImageNet: We use Adam W [25] optimizer for 300 epochs with cosine learning rate decay including 20 epochs for linear warm-up. The training batch size is 1024 with 8 GPUs. Initial learning rate starts at 0.001, and weight decay is 0.05. For ADE20K: We use Adam W [25] with a learning rate starting at 6 10 5, weight decay of 0.01, batch size of 16, crop size of 512 512. The learning rate schedule contains a warmup of 1500 iterations and linear learning rate decay. The training is conducted with 8 GPUs and the training procedure lasts for 160K iterations in total. |