Green Hierarchical Vision Transformer for Masked Image Modeling
Authors: Lang Huang, Shan You, Mingkai Zheng, Fei Wang, Chen Qian, Toshihiko Yamasaki
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on the Image Net-1K [60] (BSD 3-Clause License) image classification dataset and MS-COCO [47] (CC BY 4.0 License) object detection/instance segmentation dataset. |
| Researcher Affiliation | Collaboration | 1The University of Tokyo; 2Sense Time Research; 3The University of Sydney |
| Pseudocode | Yes | Algorithm 1 Optimal Grouping |
| Open Source Code | Yes | Corresponding author. Code and pre-trained models: https://github.com/Layne H/Green MIM. |
| Open Datasets | Yes | We conduct experiments on the Image Net-1K [60] (BSD 3-Clause License) image classification dataset and MS-COCO [47] (CC BY 4.0 License) object detection/instance segmentation dataset. |
| Dataset Splits | Yes | We fine-tune the pre-trained models on the Image Net-1K dataset and report the results on the validation set in Table 2. All models are fine-tuned on the MS-COCO [47] 2017 train split (~118k images) and finally evaluated on the val split (~5k images). |
| Hardware Specification | Yes | All the experiments of our method are performed on a single machine with eight 32G Tesla V100 GPUs |
| Software Dependencies | Yes | CUDA 10.1, Py Torch [54] 1.8 |
| Experiment Setup | Yes | The models are trained for 100/200/400/800 epochs with a total batch size of 2,048. We use the Adam W optimizer [41] with the cosine annealing schedule [50]. We set the base learning rate to 1.5e 4, the weight decay to 0.05, the hyper-parameters of Adam β1 = 0.9, β2 = 0.999, the number of warmup epochs to 40 with an initial base learning rate 1.5e 7. |