Parameter-Inverted Image Pyramid Networks

Authors: Xizhou Zhu, Xue Yang, Zhaokai Wang, Hao Li, Wenhan Dou, Junqi Ge, Lewei Lu, Yu Qiao, Jifeng Dai

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that the PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification, compared to traditional image pyramid methods and singlebranch networks, while reducing computational cost. Notably, when applying our method on a large-scale vision foundation model Intern Vi T-6B, we improve its performance by 1%-2% on detection and segmentation with only 40%-60% of the original computation. These results validate the effectiveness of the PIIP approach and provide a new technical direction for future vision computing tasks.
Researcher Affiliation Collaboration 1Open GVLab, Shanghai AI Laboratory 2Tsinghua University 3Shanghai Jiao Tong University 4The Chinese University of Hong Kong 5Sense Time Research
Pseudocode No The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes https://github.com/Open GVLab/PIIP
Open Datasets Yes The MS COCO [28] dataset is used to evaluate the performance on object detection and instance segmentation. ...We use Uper Net [54] as the basic framework to train on the ADE20K [62] dataset...Image Net-1K [11].
Dataset Splits No The paper mentions validating performance on COCO val2017 and ADE20K, but it does not explicitly provide the specific percentages or counts for training/validation/test splits, nor does it specify the methodology for these splits.
Hardware Specification Yes We adopt Adam W [32] optimizer with layer-wise learning rate decay [2] to train the model on 8 NVIDIA A800 GPUs.
Software Dependencies No The paper mentions several software components like MMDetection, DeiT, UperNet, MMSegmentation, and AdamW, but it does not provide specific version numbers for these components, which are required for a reproducible description of software dependencies.
Experiment Setup Yes The total batch size is 16, and the initial learning rate and weight decay are 1e-4 and 0.05. ...The batch size, initial learning rate and weight decay are 1024, 3e-5 and 0.1. The learning rate for the random initialized interactions is 10 times the base learning rate, i.e. 3e-4. The other settings mainly follow the fine-tuning recipe of [44] and are provided in the appendix. (Table 11: batch size 1024, epochs 20, optimizer Adam W, weight decay 0.1, learning rate scheduler cosine, initial learning rate 3e-5, warmup epochs 5, mixup 0.8, cutmix 1.0, random erasing 0, auto augment, color jitter 0.3, label smoothing 0.1, dropout, drop path rate 0.4 (Vi T-L) / 0.2 (Vi T-B) / 0.05 (Vi T-S, Vi T-T), gradient clip, loss cross entropy).