Posterior Attention Models for Sequence to Sequence Learning

Authors: Shiv Shankar, Sunita Sarawagi

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically on five translation and two morphological inflection tasks the proposed posterior attention models yield better BLEU score and alignment accuracy than existing attention models.
Researcher Affiliation Collaboration Shiv Shankar University of Massachusetts Amherst sshankar@umass.edu Sunita Sarawagi IIT Bombay sunita@iitb.ac.in ... Acknowledgements We thank NVIDIA Corporation and Flipkart for supporting this research.
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about releasing source code for its methodology or a link to a code repository. It mentions using 'author's code' for a baseline model but not for their own.
Open Datasets Yes We experiment on five language pairs from three datasets: IWSLT15 English Vietnamese, IWSLT14 German English Cettolo et al. (2015); and WAT17 Japanese English Nakazawa et al. (2016). We also used the RWTH German-English dataset which provides alignment information manually tagged by experts.
Dataset Splits No The paper mentions training models and evaluating on beam sizes but does not specify the training, validation, or test dataset splits (e.g., percentages or counts) for reproduction. It uses well-known datasets that often have standard splits, but these are not explicitly stated within the paper.
Hardware Specification No The paper thanks 'NVIDIA Corporation' for support, implying the use of NVIDIA GPUs, but it does not specify any particular GPU model (e.g., 'NVIDIA A100', 'Tesla V100'), CPU, or other hardware specifications used for running the experiments.
Software Dependencies No The paper mentions using LSTM units, SGD optimizer, and Adam optimizer but does not provide specific version numbers for these or any other software libraries or frameworks (e.g., PyTorch, TensorFlow, Python version) that would be needed to reproduce the experiments.
Experiment Setup Yes We use a 2 layer bi-directional encoder and 2 layer decoder with 512 LSTM units and 0.2 dropout with vanilla SGD optimizer. We train a one layer encoder and decoder with 128 hidden LSTM units each with a dropout rate of 0.2 using Adam and measure 0/1 accuracy.