Adaptively Aligned Image Captioning via Adaptive Attention Time
Authors: Lun Huang, Wenmin Wang, Yaxian Xia, Jie Chen
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our proposed method on the popular MS COCO dataset [16]. MS COCO dataset contains 123,287 images labeled with at least 5 captions, including 82,783 for training and 40,504 for validation. MS COCO also provides 40,775 images as the test set for online evaluation. We use the Karpathy data split [13] for the performance comparisons, where 5,000 images are used for validation, 5,000 images for testing, and the rest for training. |
| Researcher Affiliation | Academia | School of Electronic and Computer Engineering, Peking University 2Peng Cheng Laboratory 3Macau University of Science and Technology |
| Pseudocode | No | The paper provides mathematical equations and model diagrams (Figure 1) but does not contain a dedicated pseudocode or algorithm block. |
| Open Source Code | Yes | Code is available at https://github.com/husthuaan/AAT. |
| Open Datasets | Yes | We evaluate our proposed method on the popular MS COCO dataset [16]. MS COCO dataset contains 123,287 images labeled with at least 5 captions, including 82,783 for training and 40,504 for validation. |
| Dataset Splits | Yes | We use the Karpathy data split [13] for the performance comparisons, where 5,000 images are used for validation, 5,000 images for testing, and the rest for training. |
| Hardware Specification | No | The paper mentions using a pre-trained Faster-RCNN [20] model to extract features but does not provide specific details on the hardware used for training or evaluating their proposed model. |
| Software Dependencies | No | The paper mentions the use of 'ADAM [14] optimizer' and 'LSTM layers', but does not provide specific version numbers for any software dependencies or libraries (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We train our model under cross-entropy loss for 20 epochs with a minibatch size of 10, and ADAM [14] optimizer is used with a learning rate initialized with 1e-4 and annealed by 0.8 every 2 epochs. We increase the probability of feeding back a sample of the word posterior by 0.05 every 3 epochs [4]. Then we use self-critical sequence training (SCST) [21] to optimize the CIDEr-D score with REINFORCE for another 20 epochs with an initial learning rate of 1e-5 and annealed by 0.5 when the CIDEr-D score on the validation split has not improved for some training steps. |