AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses
Authors: Tong Niu, Mohit Bansal8560-8567
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | All four models outperform their base LSTM-RNN model on both diversity and relevance by a large margin, and are comparable to or better than competitive baselines (also verified via human evaluation). 4 Experimental Setup 5 Results and Analysis |
| Researcher Affiliation | Academia | Tong Niu, Mohit Bansal UNC Chapel Hill {tongn, mbansal}@cs.unc.edu |
| Pseudocode | No | The paper includes diagrams of the models (e.g., Figure 1: Min Avg Out model, Figure 3: LFT model, Figure 4: RL model), but it does not contain any structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | 1We will release all our code and model outputs. |
| Open Datasets | Yes | We use the task-oriented Ubuntu Dialogue dataset (Lowe et al. 2015), because it not only has F1 metrics to evaluate the relevance of responses, but the dialogues in them are also open-ended to allow enough space for diversity. |
| Dataset Splits | No | The paper states "We use the task-oriented Ubuntu Dialogue dataset (Lowe et al. 2015)", but it does not provide specific details on how this dataset was split into training, validation, and test sets (e.g., exact percentages or sample counts for each split). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU models, CPU types, or memory specifications). |
| Software Dependencies | No | The paper describes the architecture of the models (e.g., "LSTM is identical to that proposed by Bahdanau, Cho, and Bengio (2015)"), but it does not provide specific ancillary software details with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries). |
| Experiment Setup | Yes | For each of the three models, the hidden size of the encoder is 256, while the decoder hidden size is 512. For MINAVGOUT, the coefficient of the regularization loss term α is 100.0; For LFT, during inference we feed a score of 0.015 since it achieves a good balance between response coherence and diversity. For RL, the coefficient of the RL term β is 100.0. For the hybrid model MINAVGOUT + RL, α and β share a coefficient of 50.0. |