Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Multi-scale Hierarchical Residual Network for Dense Captioning

Authors: Yan Tian, Xun Wang, Jiachen Wu, Ruili Wang, Bailin Yang

JAIR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results have shown that our approach outperforms most current methods. In this section, we compare the efﬁciency and the performance of the proposed approach with others. We conduct experiments on a workstation with an Intel i7-4790 3.6 GHz CPU, 32GB memory, and an NVIDIA GTX Titan X graphics. We conduct extensive ablation experiments and demonstrate the effects of several important components in our framework. All experiments in this subsection are performed on the Visual Genome V1.0 dataset.
Researcher Affiliation	Academia	Yan Tian EMAIL Xun Wang EMAIL Jiachen Wu EMAIL Ruili Wang EMAIL Bailin Yang EMAIL School of Computer Science and Information Engineering, Zhejiang Gongshang University, Hangzhou 310014, P.R.China
Pseudocode	No	The paper includes figures illustrating model architectures (Figure 1, 2, 3, 4) and mathematical equations, but no explicit pseudocode or algorithm blocks are present.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code or provide a link to a code repository for the methodology described.
Open Datasets	Yes	Finally, the performance of the approach on the Visual Genome V1.0 dataset and the region labelled MS-COCO (Microsoft Common Objects in Context) dataset are demonstrated. We veriﬁed our proposed approach on the Visual Genome dataset(Krishna et al., 2017) and partial Microsoft Common Objects in Context (MS-COCO) dataset (Lin et al., 2014).
Dataset Splits	Yes	For the purpose of comparison, our experiments are mainly based on the Visual Genome V1.0 dataset. We use 77398 images for training and 5000 images for validation and testing which is same to the train/val/test splits in (Johnson et al., 2016). MS-COCO is the largest dataset regarding image captioning, with 82,783 images for training, 40,504 images for validation and 40,775 images for testing.
Hardware Specification	Yes	We conduct experiments on a workstation with an Intel i7-4790 3.6 GHz CPU, 32GB memory, and an NVIDIA GTX Titan X graphics.
Software Dependencies	Yes	We build our algorithm upon Torch 7 (Collobert, Kavukcuoglu, & Farabet, 2011) to test the performance and computational efﬁciency.
Experiment Setup	Yes	The min-batch size is 1, and each input image is ﬁrst resized to a longer side of 720 pixels. We initialize Conv1 and Blocks 1-4 with weights that are pretrained on Image Net (Deng et al., 2009) and all other weights from a Gaussian with a standard deviation of 0.01. Stochastic gradient descent is used. We set the momentum to 0.9, and the initial rate to 0.001 which is halved every 100k iterations. Weight decay is not employed in training. Fully connected layers (FC1 and FC2) have rectiﬁed linear units and are regularized with Dropout. An LSTM with 256 hidden nodes is employed for sequential modeling. we set α = 0.1 and β = 0.05 during experiments.