Hierarchical Attention Based Spatial-Temporal Graph-to-Sequence Learning for Grounded Video Description
Authors: Kai Shen, Lingfei Wu, Fangli Xu, Siliang Tang, Jun Xiao, Yueting Zhuang
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments demonstrate the effectiveness of our proposed method compared to state-of-the-art methods. and We conduct our experiments on the Grounded Activity Net Entities Dataset [Zhou et al., 2019] for evaluation. |
| Researcher Affiliation | Collaboration | Kai Shen1 , Lingfei Wu2 , Fangli Xu3 , Siliang Tang1 , Jun Xiao1 and Yueting Zhuang1 1Zhejiang University 2IBM Research 3Squirrel AI Learning {shenkai,siliang,junx,yzhuang}@zju.edu.cn, wuli@us.ibm.com, lili@yixue.us |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | No | The paper does not provide any specific repository link or explicit statement about the release of its source code for the methodology described. |
| Open Datasets | Yes | We conduct our experiments on the Grounded Activity Net Entities Dataset [Zhou et al., 2019] for evaluation. |
| Dataset Splits | Yes | For a fair comparison, the data processing procedure is the same to [Zhou et al., 2019]. and Table 2: Results on Grounded Activity Net-Entities val set. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU or CPU models used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like 'Faster R-CNN' and 'Res Ne Xt-101 backbone' but does not specify version numbers for any libraries, frameworks, or specific software dependencies needed for replication. |
| Experiment Setup | Yes | Hyperparameter settings. We set the threshold ϵ value in Eq.3 to 0.4, λa to 0.04, λb to 0.08, λc to 0.5. and number of heads m in Eq.2 to 5. The KNN hyper-parameter p {5, 10, 20, 30, 40} vary in the experiments as a results of model validation. The region proposal feature s original dimension d is 2048, the region proposals embedding dimension l is 1024, the word embedding size is 512, rnn hidden size r is 1024 and GCN s layer k is 3. The λ in Eq.4 is 0.8. |