Decoding As Dynamic Programming For Recurrent Autoregressive Models

Authors: Najam Zaidi, Trevor Cohn, Gholamreza Haffari

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the text infilling task over SWAG and Daily Dialogue datasets show that our decoding method is superior to strong competing decoding methods.
Researcher Affiliation Academia Najam Zaidi Faculty of Information Technology Monash University, Australia syed.zaidi1@monash.edu Trevor Cohn School of Computing and Information Systems University of Melbourne, Australia t.cohn@unimelb.edu.au Gholamreza Haffari Faculty of Information Technology Monash University, Australia gholamreza.haffari@monash.edu
Pseudocode No The paper describes algorithmic steps in narrative form but does not present structured pseudocode or algorithm blocks.
Open Source Code Yes Our methods and the baslines are implemented on top of Open NMT1 (Klein et al., 2017). 1https://github.com/Najamxaidi/Decoding-as-a-dynamic-program-for-recurrent-autoregressive-models.git
Open Datasets Yes We conduct experiments on two datasets: SWAG (Zellers et al., 2018) and Daily dialogue (Li et al., 2017)
Dataset Splits No The paper mentions training and testing but does not explicitly provide details about validation dataset splits (e.g., percentages or counts).
Hardware Specification No The paper mentions using 'computational infrastructure' from MASSIVE but does not provide specific details on the hardware used, such as GPU/CPU models, memory, or the number of machines.
Software Dependencies No The paper states that methods are 'implemented on top of Open NMT' but does not provide specific version numbers for Open NMT or other software dependencies.
Experiment Setup Yes All models are trained with a word embedding size and hidden dimension size of 512. We use ADAM optimiser to train the models with an initial learning rate of 0.001. ... The models were trained for 10 epochs. We use the Nesterov optimiser with a learning rate of 0.1. ... All µ s for the penalty terms corresponding to different constraints, are initialised with 0.5 and are multiplied by 1.2 after 5 iterations, and decoding was run for 10 iterations.