Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning to Discretely Compose Reasoning Module Networks for Video Captioning

Authors: Ganchao Tan, Daqing Liu, Meng Wang, Zheng-Jun Zha

IJCAI 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on MSVD and MSR-VTT datasets demonstrate the proposed RMN outperforms the state-of-the-art methods while providing an explicit and explainable generation process.
Researcher Affiliation	Academia	Ganchao Tan1 , Daqing Liu1 , Meng Wang2 and Zheng-Jun Zha1 1University of Science and Technology of China 2Hefei University of Technology EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology using text and mathematical equations but does not include a formally labeled pseudocode or algorithm block.
Open Source Code	Yes	Our code is available at https://github.com/tgc1997/RMN.
Open Datasets	Yes	MSVD. The MSVD dataset [Chen and Dolan, 2011] consists of 1,970 short video clips selected from Youtube... MSR-VTT. The MSR-VTT [Xu et al., 2016] is a large-scale dataset for the open domain video captioning...
Dataset Splits	Yes	To be consistent with previous works, we split the dataset to 3 subsets, 1,200 clips for training, 100 clips for validation, and the remaining 670 clips for testing. [...] Following the existing works, we use the standard splits, namely 6,513 clips for training, 497 clips for validation, and 2,990 clips for testing.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used for running its experiments.
Software Dependencies	No	The paper mentions 'Spacy Tagging Tool' but does not provide specific version numbers for software dependencies.
Experiment Setup	Yes	Our model is optimized by Adam Optimizer [Kingma and Ba, 2015], the initial learning rate is set to 1e-4. For the MSVD dataset, the hidden size of the LSTM is 512 and the learning rate is divided by 10 every 10 epochs. For the MSRVTT dataset, the hidden size of the LSTM is 1,300 and the learning rate is divided by 3 every 5 epochs. During testing, we use beam search with size 2 for the ﬁnal caption generation.