Prophet Attention: Predicting Attention with Future Attention
Authors: Fenglin Liu, Xuancheng Ren, Xian Wu, Shen Ge, Wei Fan, Yuexian Zou, Xu Sun
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments on the Flickr30k Entities and the MSCOCO datasets show that the proposed Prophet Attention consistently outperforms baselines in both automatic metrics and human evaluations. |
| Researcher Affiliation | Collaboration | Fenglin Liu1, Xuancheng Ren2, Xian Wu3, Shen Ge3, Wei Fan3, Yuexian Zou1,4, Xu Sun2,5 1ADSPLAB, School of ECE, Peking University 2MOE Key Laboratory of Computational Linguistics, School of EECS, Peking University 3Tencent, Beijing, China 4Peng Cheng Laboratory, Shenzhen, China 5Center for Data Science, Peking University |
| Pseudocode | No | The paper includes equations and figures, but no explicit pseudocode or algorithm blocks are provided or labeled as such. |
| Open Source Code | No | The paper states "Our code is implemented in PyTorch [41]", but does not provide a link or explicit statement about the code being open-sourced or publicly available. |
| Open Datasets | Yes | We use the Flickr30k Entities [42] and the MSCOCO [7] image captioning datasets for evaluation. |
| Dataset Splits | Yes | The MSCOCO validation and test set contain 5,000 images each, and the number is 1,000 images for Flickr30k Entities. |
| Hardware Specification | Yes | All re-implementations and our experiments were ran on V100 GPUs. |
| Software Dependencies | No | The paper mentions "spaCy library [19]" and "PyTorch [41]" but does not specify version numbers for these software dependencies. |
| Experiment Setup | Yes | We set the λ = 0.1, according to the average performance on the validation set. During training, we first pre-train the captioning model with Eq. (4) for 25 epochs and then use Eq. (8) to train the full model. |