Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Adaptively Aligned Image Captioning via Adaptive Attention Time
Authors: Lun Huang, Wenmin Wang, Yaxian Xia, Jie Chen
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our proposed method on the popular MS COCO dataset [16]. MS COCO dataset contains 123,287 images labeled with at least 5 captions, including 82,783 for training and 40,504 for validation. MS COCO also provides 40,775 images as the test set for online evaluation. We use the Karpathy data split [13] for the performance comparisons, where 5,000 images are used for validation, 5,000 images for testing, and the rest for training. |
| Researcher Affiliation | Academia | School of Electronic and Computer Engineering, Peking University 2Peng Cheng Laboratory 3Macau University of Science and Technology |
| Pseudocode | No | The paper provides mathematical equations and model diagrams (Figure 1) but does not contain a dedicated pseudocode or algorithm block. |
| Open Source Code | Yes | Code is available at https://github.com/husthuaan/AAT. |
| Open Datasets | Yes | We evaluate our proposed method on the popular MS COCO dataset [16]. MS COCO dataset contains 123,287 images labeled with at least 5 captions, including 82,783 for training and 40,504 for validation. |
| Dataset Splits | Yes | We use the Karpathy data split [13] for the performance comparisons, where 5,000 images are used for validation, 5,000 images for testing, and the rest for training. |
| Hardware Specification | No | The paper mentions using a pre-trained Faster-RCNN [20] model to extract features but does not provide specific details on the hardware used for training or evaluating their proposed model. |
| Software Dependencies | No | The paper mentions the use of 'ADAM [14] optimizer' and 'LSTM layers', but does not provide specific version numbers for any software dependencies or libraries (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We train our model under cross-entropy loss for 20 epochs with a minibatch size of 10, and ADAM [14] optimizer is used with a learning rate initialized with 1e-4 and annealed by 0.8 every 2 epochs. We increase the probability of feeding back a sample of the word posterior by 0.05 every 3 epochs [4]. Then we use self-critical sequence training (SCST) [21] to optimize the CIDEr-D score with REINFORCE for another 20 epochs with an initial learning rate of 1e-5 and annealed by 0.5 when the CIDEr-D score on the validation split has not improved for some training steps. |