Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Reproducibility Study on Adversarial Attacks Against Robust Transformer Trackers

Authors: Fatemeh Nourilenjan Nokabadi, Jean-Francois Lalonde, Christian Gagné

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper focuses on understanding how transformer trackers behave under adversarial attacks and how different attacks perform on tracking datasets as their parameters change. We conducted a series of experiments to evaluate the effectiveness of existing adversarial attacks on object trackers with transformer and non-transformer backbones. We experimented on 7 different trackers, including 3 that are transformer-based, and 4 which leverage other architectures. These trackers are tested against 4 recent attack methods to assess their performance and robustness on VOT2022ST, UAV123 and GOT10k datasets. Our empirical study focuses on evaluating adversarial robustness of object trackers based on bounding box versus binary mask predictions, and attack methods at different levels of perturbations.
Researcher Affiliation	Academia	Fatemeh Nourilenjan Nokabadi EMAIL IID, Université Laval & Mila Jean-François Lalonde EMAIL IID, Université Laval Christian Gagné EMAIL IID, Université Laval Canada CIFAR AI Chair, Mila
Pseudocode	Yes	Algorithm 1 RTAA (Jia et al., 2020) algorithm as the adversarial attack for object trackers 1: P P(t 1) Initialize with perturbation map of previous frame 2: Iadv I Initialize with clean current frame 3: for i = 1, . . . , imax do 4: Iadv Iadv + ϕϵ (P + α sign( Iadv L)) Application of adversarial gradient descent 5: Iadv max(0, min(Iadv, 255)) Clamp image values in [0, 255] 6: P Iadv I Update perturbation map 7: Return Iadv, P Return adversarial image and corresponding perturbation map Algorithm 2 SPARK (Guo et al., 2020) algorithm as the adversarial attack for object trackers 1: P P(t 1) Initialize with perturbation map of previous frame 2: S PK i=1 P(t i) Sum of perturbation maps of last K frames 3: Iadv I Initialize with clean current frame image 4: for i = 1, . . . , imax do 5: I Iadv Get a copy of current adversarial image 6: Iadv I + ϕϵ (P α sign( I L)) + S Application of adversarial gradient descent 7: Iadv max(0, min(Iadv, 255)) Clamp image values in [0, 255] 8: P Iadv I S Update perturbation map 9: S PK i=1 P(t i) Update the sum of perturbation maps 10: Iadv I + S Generate the current adversarial frame 11: Return Iadv, P Return adversarial image and corresponding perturbation map
Open Source Code	Yes	The codes necessary to reproduce this study are available at https://github.com/fatemeh N/Reproducibility Study.
Open Datasets	Yes	These trackers are tested against 4 recent attack methods to assess their performance and robustness on VOT2022ST, UAV123 and GOT10k datasets. Evaluation Protocol We selected the VOT2022 Short-term dataset and protocol (Kristan et al., 2023) because of these three reasons. Evaluation Protocol The test sets of the experiments on the perturbation level changes are the UAV123 dataset (Mueller et al., 2016) and VOTST2022 (Kristan et al., 2023). Evaluation Protocol This experiment is performed on the GOT10k dataset (Huang et al., 2019) which uses OPE protocol and the results are reported for three metrics: Average Overlap (AO), Success Rate SR0.5 with threshold 0.5 and Success Rate SR0.75 with threshold 0.75.
Dataset Splits	Yes	Evaluation Protocol We selected the VOT2022 Short-term dataset and protocol (Kristan et al., 2023) because of these three reasons. Unlike other datasets that use the one-pass evaluation protocol, the VOT2022ST follows the anchor-based short-term protocol for the trackers evaluation... In every video sequence evaluation under this protocol, the evaluation toolkit will reinitialize the tracker from the next anchor of the data to compute the anchor-based metrics wherever the tracking failure happens. Evaluation Protocol The test sets of the experiments on the perturbation level changes are the UAV123 dataset (Mueller et al., 2016) and VOTST2022 (Kristan et al., 2023). The UAV123 dataset comprises 123 video sequences... We calculate success and precision rates across various thresholds under the One Pass Evaluation (OPE) protocol. In this setup, the object tracker is initialized using the first frame and the corresponding bounding box. Subsequently, the tracker is evaluated for each frame s prediction for the rest of the video sequence. Evaluation Protocol This experiment is performed on the GOT10k dataset (Huang et al., 2019) which uses OPE protocol and the results are reported for three metrics: Average Overlap (AO), Success Rate SR0.5 with threshold 0.5 and Success Rate SR0.75 with threshold 0.75. The GOT10k test set contains 180 video sequences with axis aligned bounding box annotations.
Hardware Specification	No	No specific hardware details (like GPU/CPU models or memory) are provided for the experiments. The acknowledgements mention support for the DEEL Project but not specific hardware used for this study's experiments.
Software Dependencies	No	The paper mentions that official repositories and fine-tuned networks from original authors are used, but it does not specify the version numbers for general software dependencies like Python, PyTorch, or CUDA that were used in this study's environment for reproducibility.
Experiment Setup	Yes	Attacks Setting The SPARK (Guo et al., 2020) and RTAA (Jia et al., 2020) approaches applied on the Trans T tracker (Chen et al., 2021) are assessed in this experiment using the OPE protocol. We chose the Trans T tracker which is a pioneer on transformer trackers to observe the attack performance change on the perturbation levels. Both attacks generate the perturbed search region over a fixed number of iterations (10). While the step size α for the gradient s update is 1 for RTAA and 0.3 for SPARK. We used five levels of perturbation ϵ {2.55, 5.1, 10.2, 20.4, 40.8} to compare its effects on the Trans T (Chen et al., 2021) performance on UAV123 (Mueller et al., 2016) and VOT2022ST (Kristan et al., 2023) datasets. The ϵ s are selected as a set of coefficients {0.01, 0.02, 0.04, 0.08, 0.16} of the maximum pixel value 255 in an RGB image. It is worth mentioning that the ϵ for both attacks are set to 10 in their original settings. Therefore, the original performance of each attack is very close to the ϵ3 = 10.2 perturbation level. Attack Setting The Io U method (Jia et al., 2021) is a black-box attack on object trackers... In our experiment, we set a limit of 10 steps in the algorithm s last loop to reduce the processing time, especially for the larger upper bound ζ values. We tested the Io U attack under three upper bounds: ζ {8000, 10000, 12000}. The middle value of ζ = 10000 corresponds to the original setting of the Io U attack (Jia et al., 2021).