reproducibilityindex.ai

Decoding As Dynamic Programming For Recurrent Autoregressive Models

Authors: Najam Zaidi, Trevor Cohn, Gholamreza Haffari

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on the text inﬁlling task over SWAG and Daily Dialogue datasets show that our decoding method is superior to strong competing decoding methods.
Researcher Affiliation	Academia	Najam Zaidi Faculty of Information Technology Monash University, Australia syed.zaidi1@monash.edu Trevor Cohn School of Computing and Information Systems University of Melbourne, Australia t.cohn@unimelb.edu.au Gholamreza Haffari Faculty of Information Technology Monash University, Australia gholamreza.haffari@monash.edu
Pseudocode	No	The paper describes algorithmic steps in narrative form but does not present structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our methods and the baslines are implemented on top of Open NMT1 (Klein et al., 2017). 1https://github.com/Najamxaidi/Decoding-as-a-dynamic-program-for-recurrent-autoregressive-models.git
Open Datasets	Yes	We conduct experiments on two datasets: SWAG (Zellers et al., 2018) and Daily dialogue (Li et al., 2017)
Dataset Splits	No	The paper mentions training and testing but does not explicitly provide details about validation dataset splits (e.g., percentages or counts).
Hardware Specification	No	The paper mentions using 'computational infrastructure' from MASSIVE but does not provide specific details on the hardware used, such as GPU/CPU models, memory, or the number of machines.
Software Dependencies	No	The paper states that methods are 'implemented on top of Open NMT' but does not provide specific version numbers for Open NMT or other software dependencies.
Experiment Setup	Yes	All models are trained with a word embedding size and hidden dimension size of 512. We use ADAM optimiser to train the models with an initial learning rate of 0.001. ... The models were trained for 10 epochs. We use the Nesterov optimiser with a learning rate of 0.1. ... All µ s for the penalty terms corresponding to different constraints, are initialised with 0.5 and are multiplied by 1.2 after 5 iterations, and decoding was run for 10 iterations.