Latent Alignment and Variational Attention
Authors: Yuntian Deng, Yoon Kim, Justin Chiu, Demi Guo, Alexander Rush
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that for machine translation and visual question answering, inefficient exact latent variable models outperform standard neural attention, but these gains go away when using hard attention based training. |
| Researcher Affiliation | Academia | School of Engineering and Applied Sciences Harvard University Cambridge, MA, USA |
| Pseudocode | Yes | Algorithm 1 Variational Attention, Algorithm 2 Variational Relaxed Attention |
| Open Source Code | Yes | Our code is available at https://github.com/harvardnlp/var-attn/. |
| Open Datasets | Yes | For NMT we mainly use the IWSLT dataset [13]. ... To show that variational attention scales to large datasets, we also experiment on the WMT 2017 English-German dataset [8]... For VQA, we use the VQA 2.0 dataset. |
| Dataset Splits | Yes | For VQA, we use the VQA 2.0 dataset. As we are interested in intrinsic evaluation (i.e. log-likelihood) in addition to the standard VQA metric, we randomly select half of the standard validation set as the test set (since we need access to the actual labels). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models (e.g., NVIDIA, RTX, Tesla), CPU models (e.g., Intel Xeon, AMD Ryzen), or specific cloud computing instance types used for the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, libraries, or frameworks used (e.g., 'PyTorch 1.9' or 'Python 3.8'). |
| Experiment Setup | Yes | The full architectures/hyperparameters for both NMT and VQA are given in Appendix B. ... For NMT we evaluate intrinsically on perplexity (PPL) (lower is better) and extrinsically on BLEU (higher is better), where for BLEU we perform beam search with beam size 10 and length penalty (see Appendix B for further details). |