reproducibilityindex.ai

Multi-modal Circulant Fusion for Video-to-Language and Backward

Authors: Aming Wu, Yahong Han

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate MCF with tasks of video captioning and temporal activity localization via language (TALL). Experiments on MSVD and MSRVTT show our method obtains the state-of-the-art performance for video captioning. For TALL, by plugging into MCF, we achieve a performance gain of roughly 4.2% on TACo S.
Researcher Affiliation	Academia	Aming Wu and Yahong Han School of Computer Science and Technology, Tianjin University, Tianjin, China {tjwam, yahong}@tju.edu.cn
Pseudocode	No	The paper includes a flowchart (Figure 2) illustrating the detailed procedures of Multi-modal Circulant Fusion (MCF), but it does not present pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any specific links to source code or explicitly state that the code will be made publicly available.
Open Datasets	Yes	MSVD [Chen and Dolan, 2011] contains 1,970 video clips. MSRVTT [Xu et al., 2016] contains 10,000 video clips. TACo S dataset [Regneri et al., 2013].
Dataset Splits	Yes	For the MSVD dataset, we use 1,200 clips for training, 100 clips for validation, and 670 clips for testing. For the MSRVTT dataset, we use 6,513 clips for training, 497 clips for validation, and 2,990 clips for testing. For TACo S, we split it in 50% for training, 25% for validation and 25% for test.
Hardware Specification	No	The paper mentions using pre-trained convolutional networks like Goog Le Net and Res Net152 for feature extraction, but it does not specify any hardware details such as GPU models, CPU types, or memory used for training or inference.
Software Dependencies	No	The paper mentions using 'Adam optimizer' and referring to 'Goog Le Net' and 'Res Net152' models, but it does not specify any software dependencies with version numbers, such as programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or their specific versions.
Experiment Setup	Yes	For the multi-stage decoder, we use ﬁve dilated layers with dilated rate 1, 1, 2, 4 and 2. The number of ﬁlter channel is set to 512, 256, 256, 512 and 512, respectively. The width of ﬁlter is set to 2. For MCF, we set W1 R256 512, W2 R256 512 (in Eq. (1)) and W3 R256 512. ... We use Adam optimizer with an initial learning rate of 1 10 3. We empirically set β1 and β2 to 0.9 and 0.1, respectively. And λ0, λ1 and λ2 are set to 0.2, 0.2, and 0.6, respectively.