reproducibilityindex.ai

Perturb, Predict & Paraphrase: Semi-Supervised Learning using Noisy Student for Image Captioning

Authors: Arjit Jain, Pranay Reddy Samala, Preethi Jyothi, Deepak Mittal, Maneesh Singh

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we provide an in-depth analysis of the noisy student SSL framework for the task of image captioning and derive state-of-the-art results. Our ﬁnal results in the limited labeled data setting (1% of the MS-COCO labeled data) outperform previous state-of-the-art approaches by 2.5 on BLEU4 and 11.5 on CIDEr scores.
Researcher Affiliation	Collaboration	1Indian Institute of Technology Bombay 2Verisk Analytics {arjit,pranayr,pjyothi}@cse.iitb.ac.in,{deepak.mittal,maneesh.singh}@verisk.com
Pseudocode	Yes	Algorithm 1 Noisy Student Training for Captioning. Input: N, L, UI, UC, Paraphraser P, Student model S, Teacher model T
Open Source Code	Yes	Code, models, and datasets will be made publicly available at https://github.com/csalt-research/perturb-predict-paraphrase.
Open Datasets	Yes	We conduct experiments on the MSCOCO dataset [Lin et al., 2014], the standard benchmark used for image captioning. ... For unlabeled data, we use the Unlabeled COCO split from the ofﬁcial MSCOCO Caption challenge.
Dataset Splits	Yes	We adopt the standard Karpathy split used in all prior work, with 113k images used in training, and 5k images each used for validation and testing.
Hardware Specification	No	No specific hardware details such as GPU or CPU models, processor types, or memory specifications are provided for running the experiments. The paper mentions models like Faster-RCNN and BART, implying computational resources were used, but no specific hardware is listed.
Software Dependencies	No	The paper mentions software components like Attention on Attention Network (Ao ANet), Faster-RCNN, BART, and BERT, but it does not specify version numbers for any of these or for underlying frameworks like PyTorch or TensorFlow, or Python.
Experiment Setup	Yes	Beam decoding is used for evaluation with the beam width set to 5. ... Unless speciﬁed otherwise, we use beam decoding to generate pseudo labels with a beam width of 2. For the teacher model, we use model dropout with probability p = 0.3, no object dropout and label smoothing with probability 0.1. The student model is randomly initialized, and trained from scratch. The labeled batch size is 16, with 5 captions per image, and the unlabeled batch size is 96 with 1 caption per image. The number of noisy student iterations N = 1.