Bi-directional Adapter for Multimodal Tracking
Authors: Bing Cao, Junliang Guo, Pengfei Zhu, Qinghua Hu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on RGBT234 (Li et al. 2019) and Las He R (Li et al. 2021) datasets validate the effectiveness of our BAT framework. By training only a few parameters, BAT achieves significant advantages compared with the competing methods. |
| Researcher Affiliation | Academia | Tianjin Key Lab of Machine Learning, College of Intelligence and Computing, Tianjin University, China {caobing,guojunliang,zhupengfei,huqinghua}@tju.edu.cn |
| Pseudocode | No | The paper describes the model architecture and method using diagrams and mathematical equations, but it does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available: https://github.com/SparkTempest/BAT. |
| Open Datasets | Yes | We conduct experiments on two multi-modal tracking datasets: RGBT234 (Li et al. 2019) and Las He R (Li et al. 2021) |
| Dataset Splits | No | The paper mentions training on the 'Las He R training set' but does not specify details about a separate validation set or provide explicit percentages/counts for data splits (e.g., train/validation/test). |
| Hardware Specification | Yes | We implement our BAT based on the Pytorch and train it on 4 NVIDIA RTX A6000 GPUs with a batch size of 32. |
| Software Dependencies | No | The paper mentions "Pytorch" and "Adam W optimizer" but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We implement our BAT based on the Pytorch and train it on 4 NVIDIA RTX A6000 GPUs with a batch size of 32. We follow the hyper-parameters setting of the foundation model in the loss function. The Adam W optimizer (Loshchilov and Hutter 2019) with a weight decay of 10^-4 is adopted, and the learning rate is set to 4 x 10^-4. The fixed parameters of the modal-specific branch in BAT are initialized by the pre-trained foundation model (Ye et al. 2022). The fine-tuning of our BAT on the Las He R training set takes 60 epochs for 8 hours, where each epoch contains 6 x 10^4 sample pairs. |