reproducibilityindex.ai

Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation

Authors: Jing Wang, Yingwei Pan, Ting Yao, Jinhui Tang, Tao Mei

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments are conducted on Stanford image paragraph dataset, and superior results are reported when comparing to state-of-the-art approaches. More remarkably, CAE-LSTM increases CIDEr performance from 20.93% to 25.15%.
Researcher Affiliation	Collaboration	Jing Wang1 , Yingwei Pan2 , Ting Yao2 , Jinhui Tang1 and Tao Mei2 1School of Computer Science and Engineering, Nanjing University of Science and Technology, China 2JD AI Research, Beijing, China
Pseudocode	No	The paper describes the architecture and processes of CAE and CAE-LSTM but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	Yes	We conducted the experiments and evaluated our CAE-LSTM on Stanford image paragraph dataset (Stanford) [Krause et al., 2017], a benchmark in the ﬁeld of image paragraph generation.
Dataset Splits	Yes	In our experiments, we follow the widely used settings in [Krause et al., 2017] and take 14,575 images for training, 2,487 for validation and 2,489 for testing.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., GPU models, CPU types, or cloud instances) used for running the experiments.
Software Dependencies	No	The paper mentions software components like "Faster R-CNN", "VGG16", "LSTM", and refers to specific frameworks like "Microsoft COCO Evaluation Server" for metrics. However, it does not provide specific version numbers for any key software dependencies (e.g., "Python 3.8", "PyTorch 1.9").
Experiment Setup	Yes	Settings. For each image, we apply Faster R-CNN to detect objects within this image and select top M = 50 regions with highest detection conﬁdences to represent the image... The maximum sentence number K is 6 and the maximum word number in a sentence is 20 (padded where necessary). For our CAE, the convolutional ﬁlter size in the convolutional layer is set as C1 = 26 with stride C2 = 2. The dimensions of the embedded region-level feature and distilled topic vector are set as D1 = 1, 024 and D2 = 500. For the two-level LSTM networks, the dimension of hidden state in each LSTM is H = 1, 000. The dimension of the hidden layer for measuring attention distribution is D3 = 512... Implementation Details. ...we set the learning rate as 1 10 4... For the second phrase of self-critical training, the learning rate is set as 5 10 6... The tradeoff parameter β is set as 8 according to the validation performance. Note that Batch normalization [Ioffe and Szegedy, 2015] and dropout [Srivastava et al., 2014] (dropout rate: 0.5) are applied in our experiments.