Hierarchical Attention Based Spatial-Temporal Graph-to-Sequence Learning for Grounded Video Description

Authors: Kai Shen, Lingfei Wu, Fangli Xu, Siliang Tang, Jun Xiao, Yueting Zhuang

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments demonstrate the effectiveness of our proposed method compared to state-of-the-art methods. and We conduct our experiments on the Grounded Activity Net Entities Dataset [Zhou et al., 2019] for evaluation.
Researcher Affiliation Collaboration Kai Shen1 , Lingfei Wu2 , Fangli Xu3 , Siliang Tang1 , Jun Xiao1 and Yueting Zhuang1 1Zhejiang University 2IBM Research 3Squirrel AI Learning {shenkai,siliang,junx,yzhuang}@zju.edu.cn, wuli@us.ibm.com, lili@yixue.us
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks labeled 'Pseudocode' or 'Algorithm'.
Open Source Code No The paper does not provide any specific repository link or explicit statement about the release of its source code for the methodology described.
Open Datasets Yes We conduct our experiments on the Grounded Activity Net Entities Dataset [Zhou et al., 2019] for evaluation.
Dataset Splits Yes For a fair comparison, the data processing procedure is the same to [Zhou et al., 2019]. and Table 2: Results on Grounded Activity Net-Entities val set.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU or CPU models used for running its experiments.
Software Dependencies No The paper mentions software components like 'Faster R-CNN' and 'Res Ne Xt-101 backbone' but does not specify version numbers for any libraries, frameworks, or specific software dependencies needed for replication.
Experiment Setup Yes Hyperparameter settings. We set the threshold ϵ value in Eq.3 to 0.4, λa to 0.04, λb to 0.08, λc to 0.5. and number of heads m in Eq.2 to 5. The KNN hyper-parameter p {5, 10, 20, 30, 40} vary in the experiments as a results of model validation. The region proposal feature s original dimension d is 2048, the region proposals embedding dimension l is 1024, the word embedding size is 512, rnn hidden size r is 1024 and GCN s layer k is 3. The λ in Eq.4 is 0.8.