Bi-directional Adapter for Multimodal Tracking

Authors: Bing Cao, Junliang Guo, Pengfei Zhu, Qinghua Hu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on RGBT234 (Li et al. 2019) and Las He R (Li et al. 2021) datasets validate the effectiveness of our BAT framework. By training only a few parameters, BAT achieves significant advantages compared with the competing methods.
Researcher Affiliation Academia Tianjin Key Lab of Machine Learning, College of Intelligence and Computing, Tianjin University, China {caobing,guojunliang,zhupengfei,huqinghua}@tju.edu.cn
Pseudocode No The paper describes the model architecture and method using diagrams and mathematical equations, but it does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available: https://github.com/SparkTempest/BAT.
Open Datasets Yes We conduct experiments on two multi-modal tracking datasets: RGBT234 (Li et al. 2019) and Las He R (Li et al. 2021)
Dataset Splits No The paper mentions training on the 'Las He R training set' but does not specify details about a separate validation set or provide explicit percentages/counts for data splits (e.g., train/validation/test).
Hardware Specification Yes We implement our BAT based on the Pytorch and train it on 4 NVIDIA RTX A6000 GPUs with a batch size of 32.
Software Dependencies No The paper mentions "Pytorch" and "Adam W optimizer" but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We implement our BAT based on the Pytorch and train it on 4 NVIDIA RTX A6000 GPUs with a batch size of 32. We follow the hyper-parameters setting of the foundation model in the loss function. The Adam W optimizer (Loshchilov and Hutter 2019) with a weight decay of 10^-4 is adopted, and the learning rate is set to 4 x 10^-4. The fixed parameters of the modal-specific branch in BAT are initialized by the pre-trained foundation model (Ye et al. 2022). The fine-tuning of our BAT on the Las He R training set takes 60 epochs for 8 hours, where each epoch contains 6 x 10^4 sample pairs.