Prophet Attention: Predicting Attention with Future Attention

Authors: Fenglin Liu, Xuancheng Ren, Xian Wu, Shen Ge, Wei Fan, Yuexian Zou, Xu Sun

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experiments on the Flickr30k Entities and the MSCOCO datasets show that the proposed Prophet Attention consistently outperforms baselines in both automatic metrics and human evaluations.
Researcher Affiliation Collaboration Fenglin Liu1, Xuancheng Ren2, Xian Wu3, Shen Ge3, Wei Fan3, Yuexian Zou1,4, Xu Sun2,5 1ADSPLAB, School of ECE, Peking University 2MOE Key Laboratory of Computational Linguistics, School of EECS, Peking University 3Tencent, Beijing, China 4Peng Cheng Laboratory, Shenzhen, China 5Center for Data Science, Peking University
Pseudocode No The paper includes equations and figures, but no explicit pseudocode or algorithm blocks are provided or labeled as such.
Open Source Code No The paper states "Our code is implemented in PyTorch [41]", but does not provide a link or explicit statement about the code being open-sourced or publicly available.
Open Datasets Yes We use the Flickr30k Entities [42] and the MSCOCO [7] image captioning datasets for evaluation.
Dataset Splits Yes The MSCOCO validation and test set contain 5,000 images each, and the number is 1,000 images for Flickr30k Entities.
Hardware Specification Yes All re-implementations and our experiments were ran on V100 GPUs.
Software Dependencies No The paper mentions "spaCy library [19]" and "PyTorch [41]" but does not specify version numbers for these software dependencies.
Experiment Setup Yes We set the λ = 0.1, according to the average performance on the validation set. During training, we first pre-train the captioning model with Eq. (4) for 25 epochs and then use Eq. (8) to train the full model.