Green Hierarchical Vision Transformer for Masked Image Modeling

Authors: Lang Huang, Shan You, Mingkai Zheng, Fei Wang, Chen Qian, Toshihiko Yamasaki

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on the Image Net-1K [60] (BSD 3-Clause License) image classification dataset and MS-COCO [47] (CC BY 4.0 License) object detection/instance segmentation dataset.
Researcher Affiliation Collaboration 1The University of Tokyo; 2Sense Time Research; 3The University of Sydney
Pseudocode Yes Algorithm 1 Optimal Grouping
Open Source Code Yes Corresponding author. Code and pre-trained models: https://github.com/Layne H/Green MIM.
Open Datasets Yes We conduct experiments on the Image Net-1K [60] (BSD 3-Clause License) image classification dataset and MS-COCO [47] (CC BY 4.0 License) object detection/instance segmentation dataset.
Dataset Splits Yes We fine-tune the pre-trained models on the Image Net-1K dataset and report the results on the validation set in Table 2. All models are fine-tuned on the MS-COCO [47] 2017 train split (~118k images) and finally evaluated on the val split (~5k images).
Hardware Specification Yes All the experiments of our method are performed on a single machine with eight 32G Tesla V100 GPUs
Software Dependencies Yes CUDA 10.1, Py Torch [54] 1.8
Experiment Setup Yes The models are trained for 100/200/400/800 epochs with a total batch size of 2,048. We use the Adam W optimizer [41] with the cosine annealing schedule [50]. We set the base learning rate to 1.5e 4, the weight decay to 0.05, the hyper-parameters of Adam β1 = 0.9, β2 = 0.999, the number of warmup epochs to 40 with an initial base learning rate 1.5e 7.