AiluRus: A Scalable ViT Framework for Dense Prediction

Authors: Jin Li, Yaoming Wang, XIAOPENG ZHANG, Bowen Shi, Dongsheng Jiang, Chenglin Li, Wenrui Dai, Hongkai Xiong, Qi Tian

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our proposed method on three different datasets and observe promising performance. For example, the "Segmenter Vi T-L" model can be accelerated by 48% FPS without fine-tuning, while maintaining the performance. Additionally, our method can be applied to accelerate fine-tuning as well. Experimental results demonstrate that we can save 52% training time while accelerating 2.46 FPS with only a 0.09% performance drop.
Researcher Affiliation Collaboration Jin Li1, Yaoming Wang1, Xiaopeng Zhang2 Bowen Shi1 Dongsheng Jiang2 Chenglin Li1 Wenrui Dai1 Hongkai Xiong1 Qi Tian2 1Shanghai Jiao Tong University 2Huawei Cloud
Pseudocode No The paper describes its method using natural language and mathematical equations, but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes The code is available at https://github.com/caddyless/ailurus/tree/main.
Open Datasets Yes We evaluate our proposed method on three different datasets and observe promising performance. For example, the "Segmenter Vi T-L" model can be accelerated by 48% FPS without fine-tuning, while maintaining the performance. Additionally, our method can be applied to accelerate fine-tuning as well. Experimental results demonstrate that we can save 52% training time while accelerating 2.46 FPS with only a 0.09% performance drop.
Dataset Splits Yes The produced assignments are collected across the ADE20K [37] validation set.
Hardware Specification Yes We fine-tune the pre-trained modes on 8 V100-32G and evaluate the FPS on single V-100 32G.
Software Dependencies No The paper mentions using "MMsegmentation [5]" as its code base, but it does not provide specific version numbers for this or any other software dependencies.
Experiment Setup Yes We conducted hyper-parameter ablation experiments on the adaptive resolution strategy presented in Section 3.2 using the ADE20K semantic segmentation benchmark and the officially released Segmenter Vi T-L/16 [26] checkpoint. For the neighbor weight hyper-parameter α, we searched its value from 0.6 to 1.0 (1.0 indicates disabling this hyper-parameter), and the results showed that α = 0.9 performed best. Similarly, we searched the value of λ from 0 to 70 (0 indicates not using spatial information), and the results showed that λ = 50 performed best. The ablation results of k indicated that k = 1, i.e., choosing the closest token to calculate the local density, performed best.