Noise-Aware Image Captioning with Progressively Exploring Mismatched Words

Authors: Zhongtian Fu, Kefei Song, Luping Zhou, Yang Yang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments conducted on the MS-COCO and Conceptual Caption datasets validate the effectiveness of our method in various noisy scenarios.
Researcher Affiliation Academia 1Nanjing University of Science and Technology, Nanjing 210094, China 2The University of Sydney, Sydney 2052, Australia
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code Yes The code is available at https://github.com/njustkmg/NIC.
Open Datasets Yes we validate our method using the MS-COCO dataset (Lin et al. 2014) and the Conceptual Caption dataset (Sharma et al. 2018) collected from the Internet.
Dataset Splits Yes Using the Karpaty splitting approach (Karpathy and Fei-Fei 2017), 5,000 images are allocated for validation, 5,000 for testing, and the remainder for training. ... 1,000 images are allocated for validation, 1,000 for testing, and the remainder for training.
Hardware Specification Yes The entire network is trained on an NVIDIA TITAN X GPU.
Software Dependencies No The paper mentions 'ADAM (Kingma and Ba 2015) is used as the optimizer' but does not specify any software names with version numbers for libraries, frameworks, or languages (e.g., Python version, PyTorch/TensorFlow version).
Experiment Setup Yes In NIC, we set the model embedding size dmodel to 512, the number of transformer heads to 8, and the number of refinement encoder and decoder blocks to 3. ADAM (Kingma and Ba 2015) is used as the optimizer, with training epochs set to 25 and 30 for the two stages, and a batch size of 10. The initial learning rate is 5e-6. The hyperparameters C and τ are initially set to 0.5 and 10 and adaptively adjust once per epoch based on the curvature of the loss curve from the previous epoch.