RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning
Authors: Riccardo Del Chiaro, Bartłomiej Twardowski, Andrew Bagdanov, Joost van de Weijer
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply our approaches to incremental image captioning problem on two new continual learning benchmarks we define using the MS-COCO and Flickr30 datasets. Our results demonstrate that RATT is able to sequentially learn five captioning tasks while incurring no forgetting of previously learned ones. |
| Researcher Affiliation | Academia | Riccardo Del Chiaro MICC, University of Florence Florence 50134, FI, Italy riccardo.delchiaro@unifi.it Bartłomiej Twardowski CVC, Universitat Autónoma de Barcelona 08193 Barcelona, Spain bartlomiej.twardowski@cvc.uab.es Andrew D. Bagdanov MICC, University of Florence Florence 50134, FI, Italy andrew.bagdanov@unifi.it Joost van de Weijer CVC, Universitat Autónoma de Barcelona 08193 Barcelona, Spain joost@cvc.uab.es |
| Pseudocode | No | The paper describes mathematical equations and processes, such as LSTM definitions and attention mask applications, but does not include a distinct block of pseudocode or a clearly labeled algorithm. |
| Open Source Code | Yes | Code for experiments available here: https://github.com/delchiaro/RATT |
| Open Datasets | Yes | We applied all techniques on the Flickr30K [31] and MS-COCO [20] captioning datasets (...) [31] Bryan A. Plummer, Liwei Wang, Christopher M. Cervantes, Juan C. Caicedo, Julia Hockenmaier, and Svetlana Lazebnik. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. IJCV, 123(1):74 93, 2017. [20] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740 755. Springer, 2014. |
| Dataset Splits | Yes | Task Train Valid Test Vocab (words) transport 14,266 3,431 3,431 3,116 (...) Models where trained for a fixed number of epochs and the best model according to BLEU-4 performance on the validation set were chosen for each task. |
| Hardware Specification | Yes | We thank NVIDIA Corporation for donating the Titan XP GPU that was used to conduct the experiments. |
| Software Dependencies | No | The paper mentions using "Py Torch" and the "Adam [16] optimizer" but does not specify version numbers for any software components. |
| Experiment Setup | Yes | All experiments were conducted using Py Torch, networks were trained using the Adam [16] optimizer, all hyperparameters were tuned over validation sets. Batch size, learning rate and max-decode length for evaluation were set, respectively, to 128, 4e-4, and 26 for MS-COCO, and 32, 1e-4 and 40 for Flickr30k. (...) We apply s = 1 smax + smax 1 smax b 1 B 1 where b is the batch index and B is the total number of batches for the epoch. We used smax = 2000 and smax = 400 for experiments on Flickr30k and MS-COCO, respectively. |