Decoding As Dynamic Programming For Recurrent Autoregressive Models
Authors: Najam Zaidi, Trevor Cohn, Gholamreza Haffari
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the text infilling task over SWAG and Daily Dialogue datasets show that our decoding method is superior to strong competing decoding methods. |
| Researcher Affiliation | Academia | Najam Zaidi Faculty of Information Technology Monash University, Australia syed.zaidi1@monash.edu Trevor Cohn School of Computing and Information Systems University of Melbourne, Australia t.cohn@unimelb.edu.au Gholamreza Haffari Faculty of Information Technology Monash University, Australia gholamreza.haffari@monash.edu |
| Pseudocode | No | The paper describes algorithmic steps in narrative form but does not present structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our methods and the baslines are implemented on top of Open NMT1 (Klein et al., 2017). 1https://github.com/Najamxaidi/Decoding-as-a-dynamic-program-for-recurrent-autoregressive-models.git |
| Open Datasets | Yes | We conduct experiments on two datasets: SWAG (Zellers et al., 2018) and Daily dialogue (Li et al., 2017) |
| Dataset Splits | No | The paper mentions training and testing but does not explicitly provide details about validation dataset splits (e.g., percentages or counts). |
| Hardware Specification | No | The paper mentions using 'computational infrastructure' from MASSIVE but does not provide specific details on the hardware used, such as GPU/CPU models, memory, or the number of machines. |
| Software Dependencies | No | The paper states that methods are 'implemented on top of Open NMT' but does not provide specific version numbers for Open NMT or other software dependencies. |
| Experiment Setup | Yes | All models are trained with a word embedding size and hidden dimension size of 512. We use ADAM optimiser to train the models with an initial learning rate of 0.001. ... The models were trained for 10 epochs. We use the Nesterov optimiser with a learning rate of 0.1. ... All µ s for the penalty terms corresponding to different constraints, are initialised with 0.5 and are multiplied by 1.2 after 5 iterations, and decoding was run for 10 iterations. |