RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning

Authors: Riccardo Del Chiaro, Bartłomiej Twardowski, Andrew Bagdanov, Joost van de Weijer

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply our approaches to incremental image captioning problem on two new continual learning benchmarks we define using the MS-COCO and Flickr30 datasets. Our results demonstrate that RATT is able to sequentially learn five captioning tasks while incurring no forgetting of previously learned ones.
Researcher Affiliation Academia Riccardo Del Chiaro MICC, University of Florence Florence 50134, FI, Italy riccardo.delchiaro@unifi.it Bartłomiej Twardowski CVC, Universitat Autónoma de Barcelona 08193 Barcelona, Spain bartlomiej.twardowski@cvc.uab.es Andrew D. Bagdanov MICC, University of Florence Florence 50134, FI, Italy andrew.bagdanov@unifi.it Joost van de Weijer CVC, Universitat Autónoma de Barcelona 08193 Barcelona, Spain joost@cvc.uab.es
Pseudocode No The paper describes mathematical equations and processes, such as LSTM definitions and attention mask applications, but does not include a distinct block of pseudocode or a clearly labeled algorithm.
Open Source Code Yes Code for experiments available here: https://github.com/delchiaro/RATT
Open Datasets Yes We applied all techniques on the Flickr30K [31] and MS-COCO [20] captioning datasets (...) [31] Bryan A. Plummer, Liwei Wang, Christopher M. Cervantes, Juan C. Caicedo, Julia Hockenmaier, and Svetlana Lazebnik. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. IJCV, 123(1):74 93, 2017. [20] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740 755. Springer, 2014.
Dataset Splits Yes Task Train Valid Test Vocab (words) transport 14,266 3,431 3,431 3,116 (...) Models where trained for a fixed number of epochs and the best model according to BLEU-4 performance on the validation set were chosen for each task.
Hardware Specification Yes We thank NVIDIA Corporation for donating the Titan XP GPU that was used to conduct the experiments.
Software Dependencies No The paper mentions using "Py Torch" and the "Adam [16] optimizer" but does not specify version numbers for any software components.
Experiment Setup Yes All experiments were conducted using Py Torch, networks were trained using the Adam [16] optimizer, all hyperparameters were tuned over validation sets. Batch size, learning rate and max-decode length for evaluation were set, respectively, to 128, 4e-4, and 26 for MS-COCO, and 32, 1e-4 and 40 for Flickr30k. (...) We apply s = 1 smax + smax 1 smax b 1 B 1 where b is the batch index and B is the total number of batches for the epoch. We used smax = 2000 and smax = 400 for experiments on Flickr30k and MS-COCO, respectively.