LRR: Language-Driven Resamplable Continuous Representation against Adversarial Tracking Attacks

Authors: Jianlang Chen, Xuhong Ren, Qing Guo, Felix Juefei-Xu, Di Lin, Wei Feng, Lei Ma, Jianjun Zhao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted extensive experiments on three public datasets, demonstrating that our method significantly enhances the adversarial robustness of object trackers against four state-ofthe-art adversarial attacks. Moreover, our approach maintains high accuracy on clean data, with the adversarial accuracy even matching or surpassing the clean accuracy. For instance, in the VOT 2019 results shown in Figure 1 (b), Siam RPN++ with LRR achieves an EAO of 0.283 under the SPARK attack, outperforming the 0.079 EAO achieved by Siam RPN++ without LRR and even surpassing the results on clean data. [...] We conduct a series of experiments to evaluate LRR s defensive efficacy under various previously discussed settings, reporting the average results from three independent trials. Testing datasets. For evaluate the effectiveness of adversarial defense approach, we utilized three widely used tracking datasets: OTB100 (Wu et al., 2015), VOT2019 (Kristan et al., 2019), and UAV123 (Mueller et al., 2016).
Researcher Affiliation Collaboration Jianlang Chen1 Xuhong Ren2 Qing Guo3 Felix Juefei-Xu4 Di Lin5 Wei Feng5 Lei Ma6,7 Jianjun Zhao1 1 Kyushu University, Japan 2 Tianjin University of Technology, China 3 CFAR and IHPC, Agency for Science, Technology and Research (A*STAR), Singapore 4 Gen AI, Meta, USA 5 Tianjin University, China 6 The University of Tokyo, Japan 7 University of Alberta, Canada
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks (e.g., a section explicitly labeled 'Algorithm' or 'Pseudocode').
Open Source Code Yes We have built a benchmark and released our code in https://github.com/tsingqguo/robust OT. To facilitate the reproducibility of our approach, we have open-sourced our code and provided a benchmark that includes our method, which is accessible via https://github.com/ tsingqguo/robust OT.
Open Datasets Yes Training datasets. We employ three widely-used datasets, i.e., Image Net-DET (Russakovsky et al., 2015), Image Net-VID, and You Tube-Bounding Boxes (Real et al., 2017) to train the STIR.
Dataset Splits Yes We have sampled around 490,000 pairs for training STIR and LResample Net, and 20,000 pairs as the validation set.
Hardware Specification Yes We train and perform our method on a server with an NVIDIA RTX A6000 GPU and an Intel Core i9-10980XE 3.0GHz CPU using Pytorch (Paszke et al., 2019).
Software Dependencies No The paper mentions 'Pytorch (Paszke et al., 2019)' but does not provide a specific version number for the software dependency.
Experiment Setup Yes Architectures. We set the fθsp and fθtp are five-layer MLPs with a Re LU activation layer and the hidden dimensions are 256. We use the network of (Lim et al., 2017) without the upsampling modules as the encoder for extracting pixel features (i.e., fβ), which can generate a feature with the same size as the input image. Loss function. Given an attacked image sequence V = {Iτ}t τ=t N and the object template T, we obtain the reconstructed tth frame ˆIt. When we have the clean version of ˆIt (i.e., I t ), we follow existing works and only use the L1 loss function to train the STIR and LResample Net. Training datasets. We employ three widely-used datasets, i.e., Image Net-DET (Russakovsky et al., 2015), Image Net-VID, and You Tube-Bounding Boxes (Real et al., 2017) to train the STIR. Specifically, given a randomly sampled video, we randomly select five continuous frames in the video to form an image sequence and crop the object template T from another randomly chosen frame. Then, we add adversarial perturbations to the image sequence and regard the perturbed sequence as the V in Equation 4. Here, we apply the FGSM attack on a pre-trained Siam RPN++ with Res Net50 tracker to produce adversarial perturbations. After that, we have a pair of V and T as the training sample. We have sampled around 490,000 pairs for training STIR and LResample Net, and 20,000 pairs as the validation set. We train the STIR and LResample Net independently since they have different functionalities, and joint training could hardly get good results for both modules.