reproducibilityindex.ai

RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning

Authors: Riccardo Del Chiaro, Bartłomiej Twardowski, Andrew Bagdanov, Joost van de Weijer

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply our approaches to incremental image captioning problem on two new continual learning benchmarks we deﬁne using the MS-COCO and Flickr30 datasets. Our results demonstrate that RATT is able to sequentially learn ﬁve captioning tasks while incurring no forgetting of previously learned ones.
Researcher Affiliation	Academia	Riccardo Del Chiaro MICC, University of Florence Florence 50134, FI, Italy riccardo.delchiaro@unifi.it Bartłomiej Twardowski CVC, Universitat Autónoma de Barcelona 08193 Barcelona, Spain bartlomiej.twardowski@cvc.uab.es Andrew D. Bagdanov MICC, University of Florence Florence 50134, FI, Italy andrew.bagdanov@unifi.it Joost van de Weijer CVC, Universitat Autónoma de Barcelona 08193 Barcelona, Spain joost@cvc.uab.es
Pseudocode	No	The paper describes mathematical equations and processes, such as LSTM definitions and attention mask applications, but does not include a distinct block of pseudocode or a clearly labeled algorithm.
Open Source Code	Yes	Code for experiments available here: https://github.com/delchiaro/RATT
Open Datasets	Yes	We applied all techniques on the Flickr30K [31] and MS-COCO [20] captioning datasets (...) [31] Bryan A. Plummer, Liwei Wang, Christopher M. Cervantes, Juan C. Caicedo, Julia Hockenmaier, and Svetlana Lazebnik. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. IJCV, 123(1):74 93, 2017. [20] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740 755. Springer, 2014.
Dataset Splits	Yes	Task Train Valid Test Vocab (words) transport 14,266 3,431 3,431 3,116 (...) Models where trained for a ﬁxed number of epochs and the best model according to BLEU-4 performance on the validation set were chosen for each task.
Hardware Specification	Yes	We thank NVIDIA Corporation for donating the Titan XP GPU that was used to conduct the experiments.
Software Dependencies	No	The paper mentions using "Py Torch" and the "Adam [16] optimizer" but does not specify version numbers for any software components.
Experiment Setup	Yes	All experiments were conducted using Py Torch, networks were trained using the Adam [16] optimizer, all hyperparameters were tuned over validation sets. Batch size, learning rate and max-decode length for evaluation were set, respectively, to 128, 4e-4, and 26 for MS-COCO, and 32, 1e-4 and 40 for Flickr30k. (...) We apply s = 1 smax + smax 1 smax b 1 B 1 where b is the batch index and B is the total number of batches for the epoch. We used smax = 2000 and smax = 400 for experiments on Flickr30k and MS-COCO, respectively.