Peripheral Vision Transformer

Authors: Juhong Min, Yucheng Zhao, Chong Luo, Minsu Cho

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed network, dubbed Per Vi T, on Image Net-1K and systematically investigate the inner workings of the model for machine perception... The performance improvements in image classification over the baselines across different model sizes demonstrate the efficacy of the proposed method. (Abstract) and In this section, we first investigate the inner workings of Per Vi T trained on Image Net-1K classification dataset to examine how it benefits from the proposed peripheral projections and initialization, and then compare the method with previous state of the arts under comparable settings. (Section 4)
Researcher Affiliation Collaboration Juhong Min1 Yucheng Zhao2,3 Chong Luo2 Minsu Cho1 1Pohang University of Science and Technology (POSTECH) 2Microsoft Research Asia (MSRA) 3University of Science and Technology of China (USTC)
Pseudocode No The paper describes the mathematical formulations and procedures but does not include any figure, block, or section explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] We include the code and instructions for reproduction in the supplementary. (Checklist, Section 3a)
Open Datasets Yes Our experiments focus on image classification on Image Net-1K [13]. (Section 4) and We are using only publicly available, benchmark datasets. (Checklist, Section 4d)
Dataset Splits Yes Our experiments focus on image classification on Image Net-1K [13]. Following training recipes of Dei T [58], we train our model on Image Net-1K from scratch... We refer to the supplementary for additional details. (Section 4) and Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] We specify all the training details in the supplementary. (Checklist, Section 3b)
Hardware Specification Yes Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] We include the amount and type of resources used for training in the supplementary. (Checklist, Section 3d)
Software Dependencies Yes As our baseline code, we use Dei T [58] which is implemented with Py Torch [48] framework, all of which are open-sourced. (Checklist, Section 4a)
Experiment Setup Yes Following training recipes of Dei T [58], we train our model on Image Net-1K from scratch with batch size of 1024, learning rate of 0.001 using Adam W [42] optimizer, cosine learning rate decay scheduler, and the same data augmentations [14] for 300 epochs, including warm-up epochs. We evaluate our model with three different sizes, e.g., Tiny (T), Small (S), and Medium (M). We use stochastic depths of 0.0, 0.1, and 0.2 for T, S, and M respectively. We refer to the supplementary for additional details. (Section 4)