ViSTec: Video Modeling for Sports Technique Recognition and Tactical Analysis

Authors: Yuchen He, Zeqing Yuan, Yihong Wu, Liqi Cheng, Dazhen Deng, Yingcai Wu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that our method outperforms existing models by a significant margin. Case studies with experts from the Chinese national table tennis team validate our model s capacity to automate analysis for technical actions and tactical strategies. We perform comparative experiments with state-of-the-art action segmentation models and conduct an ablation study to examine individual components.
Researcher Affiliation Academia Yuchen He*, Zeqing Yuan*, Yihong Wu, Liqi Cheng, Dazhen Deng , Yingcai Wu Zhejiang University {heyuchen, leoyuan, wuyihong, lycheecheng, dengdazhen, ycwu}@zju.edu.cn
Pseudocode Yes Algorithm 1: Updating Wtecp Input: Weight vector Wtecp, predicted label of current segment tecpred, and ground-truth label tecgt. Output: Updated weight vector W tecp. 1: Initialize: W tecp Wtecp 2: W tecp[tecpred] (1 βU(cls(fi)))W tecp[tecpred] 3: W tecp[tecgt] (1 + βU(cls(fi)))W tecp[tecgt] 4: Normalization: W tecp W tecp/ max(W tecp)
Open Source Code No More details are available at: https://Vi STec2024.github.io/. The paper provides a project page URL, which may contain code, but not a direct, specific repository link or an explicit statement confirming code release for the methodology within the paper text itself.
Open Datasets No All experiments are performed on a dataset constructed from broadcast videos of World Table Tennis (WTT) games. We collected 4000 rally clips segmented from 18 games by recognizing scoreboard changes (Deng et al. 2021). The paper describes the construction of their own dataset but does not provide concrete access information (e.g., link, DOI, repository, or explicit statement of public availability) for it.
Dataset Splits No The paper mentions a dataset and that it's used for training and evaluation but does not explicitly provide specific details about the dataset splits (e.g., percentages, sample counts, or citations to predefined splits) for training, validation, and test sets.
Hardware Specification Yes Furthermore, offline tests on a single A100 GPU show Vi STec achieving an inference speed of 39.3 frames per second, which exceeds the typical frame rate of broadcast match videos, enabling real-time processing.
Software Dependencies No The paper mentions using Video MAE as a backbone and describes network architectures but does not specify any software dependencies with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, or specific library versions).
Experiment Setup No The paper describes various components of its model and training process, such as slice length, loss functions, and the use of hyperparameters 'alpha' and 'beta', but it does not provide concrete numerical values for general experimental setup details like learning rate, batch size, number of epochs, or the specific optimizer used and its parameters.