reproducibilityindex.ai

Convolutional Hierarchical Attention Network for Query-Focused Video Summarization

Authors: Shuwen Xiao, Zhou Zhao, Zijian Zhang, Xiaohui Yan, Min Yang12426-12433

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on the benchmark dataset demonstrate the competitive performance and show the effectiveness of our approach.
Researcher Affiliation	Collaboration	1College of Computer Science and Technology, Zhejiang University, Hangzhou, China 2CBG Intelligent Engineering Dept., Huawei Technologies, China 3Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy of Sciences
Pseudocode	No	The paper describes the method using text and a block diagram (Figure 2), but does not contain pseudocode or a clearly labeled algorithm block.
Open Source Code	No	The paper does not contain any explicit statement about releasing the source code for the described methodology or a direct link to a code repository.
Open Datasets	Yes	We evaluate our method on the query-focused video summarization dataset proposed in (Sharghi, Laurel, and Gong 2017). ... Sharghi, A.; Laurel, J. S.; and Gong, B. 2017. Queryfocused video summarization: Dataset, evaluation, and a memory network based approach. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2127 2136.
Dataset Splits	No	Following the setting in (Sharghi, Laurel, and Gong 2017), we randomly select two videos for training, one for testing and the remaining one for testing. The paper does not explicitly define a separate validation dataset split.
Hardware Specification	Yes	We use Pytorch to implement our approach on a server with a GTX TITAN X card.
Software Dependencies	No	The paper mentions 'Pytorch' but does not provide a specific version number. Other software or models are referenced by name or the paper introducing them (e.g., 'Res Net', 'Glove vectors', 'Adam optimizer') without version details.
Experiment Setup	Yes	In the feature encoding layer, we propose two-layer fully convolution block, in which the output channel dimension for ﬁrst layer is 256 and for the second one is 512. In the local self-attention module and queryaware global attention module, the dimension of attention dc is set to 256. The dimension of the visual-textual fused space in query-relevance computing module is 512. In the training process, we use Adam optimizer (Kingma and Ba 2014) to minimize the loss, with its initial learning rate 0.0001 and decay rate of 0.8. The minibatch strategy is also used and the batch size is set to 5.