TransVCL: Attention-Enhanced Video Copy Localization Network with Flexible Supervision

Authors: Sifeng He, Yue He, Minlong Lu, Chen Jiang, Xudong Yang, Feng Qian, Xiaobo Zhang, Lei Yang, Jiandong Zhang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our proposed method on two current segment-level annotated video copy dataset VCSL (He et al. 2022) and VCDB (Jiang, Jiang, and Wang 2014). The experiments show that Trans VCL outperforms other algorithms by a large margin with much more accurate copied segment localization.
Researcher Affiliation Collaboration 1Ant Group 2 Copyright Protection Center of China {sifeng.hsf, youzhi.qf}@antgroup.com
Pseudocode No The paper describes its methods in text and uses mathematical equations, but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Code is publicly available at https://github.com/transvcl/Trans VCL.
Open Datasets Yes we use VCSL dataset for method comparisons. In addition, we also evaluate results on VCDB dataset with smaller scale of data but also containing ground truth copied segments. Besides these two existing datasets with detailed segment-level copied annotations, we further utilize video-level copy dataset (FIVR and SVD) as weakly labeled data for semi-supervised evaluation.
Dataset Splits No The paper refers to using VCSL and VCDB datasets for evaluation, and describes how labeled training data is sampled for semi-supervised settings (e.g., 'randomly sample 1%, 2%, 5% and 10% of labeled training data in VCSL and use rest of training data as an unlabeled or weakly labeled set'), but does not explicitly provide details for a separate validation split for its main supervised experiments.
Hardware Specification No The paper does not provide specific details regarding the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions using 'ISC features', 'YOLOX (Ge et al. 2021)', and 'SGD with momentum', but does not provide specific version numbers for software dependencies or libraries like Python, PyTorch, or CUDA.
Experiment Setup Yes In the training stage of our experiments, all the video feature sequences are truncated with the maximum length of 1200 (i.e., 20min) and padded zeros for sequence length lower than 1200. Therefore, the input to Trans VCL network is pairs of video features with uniform size of 1200 256. In the feature enhancement component, the head of Transformer is 8. In the similarity generation component, temperature τ of dual-softmax is 0.1 and the similarity matrix is reshaped to (640, 640). In copied segment localization module, we adopt the anchor-free detection network YOLOX (Ge et al. 2021) with simple design, and the regression loss weight λ is 5 as default. The entire model is trained using SGD with momentum 0.9, batch size of 64, initial learning rate of 0.01 and weight decay of 0.0005.