Temporal Relational Modeling with Self-Supervision for Action Segmentation

Authors: Dong Wang, Di Hu, Xingjian Li, Dejing Dou2729-2737

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our DTGRM model outperforms state-of-the-art action segmentation models on three challenging datasets: 50Salads, Georgia Tech Egocentric Activities (GTEA), and the Breakfast dataset. Experiments on multiple benchmark datasets demonstrate the effectiveness of the proposed DTGRM for addressing action segmentation task. In this section, we compare the proposed model with several state-of-the-art models on three datasets: 50Salads, GTEA, and the Breakfast dataset. The results are presented in Table. 1. Ablation Studies
Researcher Affiliation Collaboration Dong Wang1, Di Hu2,3 , Xingjian Li4, Dejing Dou4 1School of Computer Science and Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, China 2Gaoling School of Artificial Intelligence, Renmin University of China, Beijing 100872, China 3Beijing Key Laboratory of Big Data Management and Analysis Methods 4Big Data Laboratory, Baidu Research
Pseudocode No The paper describes the model and methods using textual descriptions and mathematical equations, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/redwang/DTGRM.
Open Datasets Yes The 50Salads (Stein and Mc Kenna 2013) dataset consists of 50 videos..., The GTEA (Fathi, Ren, and Rehg 2011) dataset contains 28 videos..., The Breakfast (Kuehne, Arslan, and Serre 2014) dataset is the largest among the three datasets...
Dataset Splits No The paper describes the datasets used (50Salads, GTEA, Breakfast) and their characteristics, but does not provide explicit training, validation, and test split percentages or sample counts, nor does it refer to specific predefined splits by citation within the context of the experiment setup.
Hardware Specification Yes The whole model proposed in this paper consists of one backbone network and three DTGRMs (i.e., S = 3) that are implemented with Pytorch library on Nvidia 2080Ti GPU.
Software Dependencies No The paper states implementation with 'Pytorch library' but does not specify its version number or any other software dependencies with their respective versions.
Experiment Setup Yes We set the dimension of hidden representation d as 64 for backbone network and our DTGRMs. The proposed DTGRM constructs K = 10 dilated temporal graphs and apply DRGC layer on each level, where the dilation factor is doubled at each level. For hyperparameter η in auxiliary self-supervised task, we set it as η = 20. For the loss function, we set ω = 0.15, λe = 2 and λc = 0.5. In all experiments, the network is trained using Adam optimizer with a learning rate of 5e-4.