Convolutional Hierarchical Attention Network for Query-Focused Video Summarization
Authors: Shuwen Xiao, Zhou Zhao, Zijian Zhang, Xiaohui Yan, Min Yang12426-12433
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the benchmark dataset demonstrate the competitive performance and show the effectiveness of our approach. |
| Researcher Affiliation | Collaboration | 1College of Computer Science and Technology, Zhejiang University, Hangzhou, China 2CBG Intelligent Engineering Dept., Huawei Technologies, China 3Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy of Sciences |
| Pseudocode | No | The paper describes the method using text and a block diagram (Figure 2), but does not contain pseudocode or a clearly labeled algorithm block. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing the source code for the described methodology or a direct link to a code repository. |
| Open Datasets | Yes | We evaluate our method on the query-focused video summarization dataset proposed in (Sharghi, Laurel, and Gong 2017). ... Sharghi, A.; Laurel, J. S.; and Gong, B. 2017. Queryfocused video summarization: Dataset, evaluation, and a memory network based approach. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2127 2136. |
| Dataset Splits | No | Following the setting in (Sharghi, Laurel, and Gong 2017), we randomly select two videos for training, one for testing and the remaining one for testing. The paper does not explicitly define a separate validation dataset split. |
| Hardware Specification | Yes | We use Pytorch to implement our approach on a server with a GTX TITAN X card. |
| Software Dependencies | No | The paper mentions 'Pytorch' but does not provide a specific version number. Other software or models are referenced by name or the paper introducing them (e.g., 'Res Net', 'Glove vectors', 'Adam optimizer') without version details. |
| Experiment Setup | Yes | In the feature encoding layer, we propose two-layer fully convolution block, in which the output channel dimension for first layer is 256 and for the second one is 512. In the local self-attention module and queryaware global attention module, the dimension of attention dc is set to 256. The dimension of the visual-textual fused space in query-relevance computing module is 512. In the training process, we use Adam optimizer (Kingma and Ba 2014) to minimize the loss, with its initial learning rate 0.0001 and decay rate of 0.8. The minibatch strategy is also used and the batch size is set to 5. |