Temporal Relational Modeling with Self-Supervision for Action Segmentation
Authors: Dong Wang, Di Hu, Xingjian Li, Dejing Dou2729-2737
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our DTGRM model outperforms state-of-the-art action segmentation models on three challenging datasets: 50Salads, Georgia Tech Egocentric Activities (GTEA), and the Breakfast dataset. Experiments on multiple benchmark datasets demonstrate the effectiveness of the proposed DTGRM for addressing action segmentation task. In this section, we compare the proposed model with several state-of-the-art models on three datasets: 50Salads, GTEA, and the Breakfast dataset. The results are presented in Table. 1. Ablation Studies |
| Researcher Affiliation | Collaboration | Dong Wang1, Di Hu2,3 , Xingjian Li4, Dejing Dou4 1School of Computer Science and Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, China 2Gaoling School of Artificial Intelligence, Renmin University of China, Beijing 100872, China 3Beijing Key Laboratory of Big Data Management and Analysis Methods 4Big Data Laboratory, Baidu Research |
| Pseudocode | No | The paper describes the model and methods using textual descriptions and mathematical equations, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/redwang/DTGRM. |
| Open Datasets | Yes | The 50Salads (Stein and Mc Kenna 2013) dataset consists of 50 videos..., The GTEA (Fathi, Ren, and Rehg 2011) dataset contains 28 videos..., The Breakfast (Kuehne, Arslan, and Serre 2014) dataset is the largest among the three datasets... |
| Dataset Splits | No | The paper describes the datasets used (50Salads, GTEA, Breakfast) and their characteristics, but does not provide explicit training, validation, and test split percentages or sample counts, nor does it refer to specific predefined splits by citation within the context of the experiment setup. |
| Hardware Specification | Yes | The whole model proposed in this paper consists of one backbone network and three DTGRMs (i.e., S = 3) that are implemented with Pytorch library on Nvidia 2080Ti GPU. |
| Software Dependencies | No | The paper states implementation with 'Pytorch library' but does not specify its version number or any other software dependencies with their respective versions. |
| Experiment Setup | Yes | We set the dimension of hidden representation d as 64 for backbone network and our DTGRMs. The proposed DTGRM constructs K = 10 dilated temporal graphs and apply DRGC layer on each level, where the dilation factor is doubled at each level. For hyperparameter η in auxiliary self-supervised task, we set it as η = 20. For the loss function, we set ω = 0.15, λe = 2 and λc = 0.5. In all experiments, the network is trained using Adam optimizer with a learning rate of 5e-4. |