AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses

Authors: Tong Niu, Mohit Bansal8560-8567

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental All four models outperform their base LSTM-RNN model on both diversity and relevance by a large margin, and are comparable to or better than competitive baselines (also verified via human evaluation). 4 Experimental Setup 5 Results and Analysis
Researcher Affiliation Academia Tong Niu, Mohit Bansal UNC Chapel Hill {tongn, mbansal}@cs.unc.edu
Pseudocode No The paper includes diagrams of the models (e.g., Figure 1: Min Avg Out model, Figure 3: LFT model, Figure 4: RL model), but it does not contain any structured pseudocode or clearly labeled algorithm blocks.
Open Source Code No 1We will release all our code and model outputs.
Open Datasets Yes We use the task-oriented Ubuntu Dialogue dataset (Lowe et al. 2015), because it not only has F1 metrics to evaluate the relevance of responses, but the dialogues in them are also open-ended to allow enough space for diversity.
Dataset Splits No The paper states "We use the task-oriented Ubuntu Dialogue dataset (Lowe et al. 2015)", but it does not provide specific details on how this dataset was split into training, validation, and test sets (e.g., exact percentages or sample counts for each split).
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies No The paper describes the architecture of the models (e.g., "LSTM is identical to that proposed by Bahdanau, Cho, and Bengio (2015)"), but it does not provide specific ancillary software details with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries).
Experiment Setup Yes For each of the three models, the hidden size of the encoder is 256, while the decoder hidden size is 512. For MINAVGOUT, the coefficient of the regularization loss term α is 100.0; For LFT, during inference we feed a score of 0.015 since it achieves a good balance between response coherence and diversity. For RL, the coefficient of the RL term β is 100.0. For the hybrid model MINAVGOUT + RL, α and β share a coefficient of 50.0.