VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding

Authors: Yi Xin, Junlong Du, Qiang Wang, Zhiwen Lin, Ke Yan

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on four dense scene understanding tasks demonstrate the superiority of VMT-Adapter(-Lite), achieving a 3.96% (1.34%) relative improvement compared to singletask full fine-tuning, while utilizing merely 1% (0.36%) trainable parameters of the pre-trained model.
Researcher Affiliation Collaboration Yi Xin1,2, Junlong Du2, Qiang Wang2, Zhiwen Lin2, Ke Yan2* 1State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 2Youtu Lab, Tencent xinyi@smail.nju.edu.cn, {jeffdu, albertqwang, xavierzwlin, kerwinyan}@tencent.com
Pseudocode No The paper describes methods using equations and prose, but does not provide structured pseudocode or algorithm blocks.
Open Source Code No No explicit statement or link regarding open-source code for the VMT-Adapter method is provided in the paper.
Open Datasets Yes To evaluate our proposed approach for multi-task dense scene understanding, we follow the prior works (Vandenhende et al. 2021; Liu et al. 2022) and conduct experiments on the PASCALContext (Vandenhende, Georgoulis, and Van Gool 2020) dataset.
Dataset Splits Yes PASCAL-Context comprises 4,998 and 5,105 images in the training and testing splits, respectively.
Hardware Specification Yes We conduct all experiments using the Py Torch toolkit on 4 NVIDIA V100 GPUs.
Software Dependencies No We conduct all experiments using the Py Torch toolkit on 4 NVIDIA V100 GPUs. (PyTorch is mentioned, but without a specific version number).
Experiment Setup Yes Specifically, we use batch size 12 and train for 60 epochs for each task. We employ the Adam optimizer with a learning rate 1e 4 and a weight decay 1e 4, and the learning rate is linearly decreased with respect to the iteration.