Swell-and-Shrink: Decomposing Image Captioning by Transformation and Summarization
Authors: Hanzhang Wang, Hanli Wang, Kaisheng Xu
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate the effectiveness of the proposed method. |
| Researcher Affiliation | Academia | Hanzhang Wang , Hanli Wang , Kaisheng Xu Department of Computer Science and Technology, Tongji University, Shanghai, P. R. China |
| Pseudocode | No | No pseudocode or clearly labeled algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing code or a link to a code repository. |
| Open Datasets | Yes | The proposed method is evaluated on the benchmark dataset MSCOCO [Lin et al., 2014] |
| Dataset Splits | Yes | We follow the widely adopted train/val/test split as in [Karpathy and Fei-Fei, 2015], i.e., 5000 images for both validation and testing, and the rest for training. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments (e.g., GPU/CPU models, memory, or cloud instance types). |
| Software Dependencies | No | The paper mentions software components like Faster R-CNN, VGG16, and ADAM optimizer but does not specify their version numbers or other crucial software dependencies required for replication. |
| Experiment Setup | Yes | In the language model, the number of hidden units and the number of factors in each LSTM are all set to 512. A gradient will be clipped if its value exceeds 1. The ADAM optimizer is used for training with α = 0.8, β = 0.999 and ϵ = 1 10 8. The initial learning rate is set to 1 10 4 and exponential reduction is used which halves the learning rate every 10 epochs. |