reproducibilityindex.ai

AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses

Authors: Tong Niu, Mohit Bansal8560-8567

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	All four models outperform their base LSTM-RNN model on both diversity and relevance by a large margin, and are comparable to or better than competitive baselines (also veriﬁed via human evaluation). 4 Experimental Setup 5 Results and Analysis
Researcher Affiliation	Academia	Tong Niu, Mohit Bansal UNC Chapel Hill {tongn, mbansal}@cs.unc.edu
Pseudocode	No	The paper includes diagrams of the models (e.g., Figure 1: Min Avg Out model, Figure 3: LFT model, Figure 4: RL model), but it does not contain any structured pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	1We will release all our code and model outputs.
Open Datasets	Yes	We use the task-oriented Ubuntu Dialogue dataset (Lowe et al. 2015), because it not only has F1 metrics to evaluate the relevance of responses, but the dialogues in them are also open-ended to allow enough space for diversity.
Dataset Splits	No	The paper states "We use the task-oriented Ubuntu Dialogue dataset (Lowe et al. 2015)", but it does not provide specific details on how this dataset was split into training, validation, and test sets (e.g., exact percentages or sample counts for each split).
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies	No	The paper describes the architecture of the models (e.g., "LSTM is identical to that proposed by Bahdanau, Cho, and Bengio (2015)"), but it does not provide specific ancillary software details with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries).
Experiment Setup	Yes	For each of the three models, the hidden size of the encoder is 256, while the decoder hidden size is 512. For MINAVGOUT, the coefﬁcient of the regularization loss term α is 100.0; For LFT, during inference we feed a score of 0.015 since it achieves a good balance between response coherence and diversity. For RL, the coefﬁcient of the RL term β is 100.0. For the hybrid model MINAVGOUT + RL, α and β share a coefﬁcient of 50.0.