On the Robustness of Neural-Enhanced Video Streaming against Adversarial Attacks
Authors: Qihua Zhou, Jingcai Guo, Song Guo, Ruibin Li, Jie Zhang, Bingjie Wang, Zhenda Xu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluation based on state-of-the-art video codec benchmark illustrates that our attack significantly degrades the recovery performance of Ne VS over previous attack methods. The damaged video quality finally leads to obvious malfunction of downstream tasks with over 75% success rate. Extensive experiments based on the typical UDM10 (Yang et al. 2019) benchmark demonstrate that codec hijacking significantly deteriorates the restoration performance of Ne VS using different SR models, and finally results in downstream task malfunction with over 75% success rate, including multiple object tracking and human pose estimation. |
| Researcher Affiliation | Academia | Qihua Zhou1, Jingcai Guo1,2*, Song Guo3 , Ruibin Li1, Jie Zhang1, Bingjie Wang1, Zhenda Xu1 1The Hong Kong Polytechnic University, Hong Kong 2The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, China 3The Hong Kong University of Science and Technology, Hong Kong |
| Pseudocode | Yes | Algorithm 1: Distortion-oriented QP Matrix Controlling |
| Open Source Code | No | The paper does not explicitly state that the source code for the methodology is openly available or provide a link to a code repository. |
| Open Datasets | Yes | We employ the typical UDM10 (Yang et al. 2019) benchmark, covering the downstream tasks of object tracking and human pose estimation. The video codecs are based on the widely-used H.264/AVC (H.264 2023) and H.265/HEVC (H.265 2023) standards. As to the video enhancement module, we consider 11 typical super-resolution models with diverse architectures and parameter sizes, including EDSR (Lim et al. 2017), EUSR (Choi et al. 2020), DBPN (Haris, Shakhnarovich, and Ukita 2018), RCAN (Zhang et al. 2018), MSRN (Li et al. 2018), 4PP-EUSR (Choi et al. 2020), ESRGAN (Wang et al. 2018), RRDB (Wang et al. 2018), CARN (Ahn, Kang, and Sohn 2018), FRSR (Soh et al. 2019) and NATSR (Soh et al. 2019). Multiple object tracking. Table 2 shows the performance comparison based on the challenging Multiple Object Tracking dataset (MOTChallenge 2023). Human pose estimation. We deploy human pose estimation on the Human3.6M dataset. |
| Dataset Splits | No | The paper mentions using specific datasets (UDM10, MOTChallenge 2023, Human3.6M) and standard models, but it does not explicitly detail the training, validation, and test splits (e.g., percentages, sample counts, or specific split files) used for reproducibility beyond implicitly relying on standard benchmark practices which are not explicitly defined in the text. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | Yes | The video codecs are based on the widely-used H.264/AVC (H.264 2023) and H.265/HEVC (H.265 2023) standards. the codec information can be accessed by probing original LR videos, such as using ffprobe (ffprobe 2023). |
| Experiment Setup | Yes | The video codecs are based on the widely-used H.264/AVC (H.264 2023) and H.265/HEVC (H.265 2023) standards. Thus, the bitstream size budget (upper bound) is the size achieved by vanilla codecs on clean data, with constant QP 23, medium preset, and yuv420p pixel format. the L -norm constraint, i.e., xi ˆxi α, where α is the distortion upper bound following the Iterative Fast Gradient Sign Method (I-FGSM) (Kurakin, Goodfellow, and Bengio 2017; Choi et al. 2019) update rule. We can initialize δ0 j by a Gaussian distribution noise and iteratively optimize the global perturbation as: δt+1 j = Clip α,α δt j + α T sign L(δt j) , (2) where the superscript t, T and sign L(.) represent the iteration index, the maximum number of iterations and the sign of the gradient, respectively. Algorithm 1: Distortion-oriented QP Matrix Controlling (line 1: epochs E; Set the maximum epochs. line 2: S 32; Set the initial QP controlling stride. line 3: Q 0; Uniformly initialize QP matrix.) |