Detecting Egregious Responses in Neural Sequence-to-sequence Models
Authors: Tianxing He, James Glass
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We adopt an empirical methodology, in which we first create lists of egregious output sequences, and then design a discrete optimization algorithm to find input sequences that will cause the model to generate them. Moreover, the optimization algorithm is enhanced for large vocabulary search and constrained to search for input sequences that are likely to be input by real-world users. In our experiments, we apply this approach to dialogue response generation models trained on three real-world dialogue data-sets: Ubuntu, Switchboard and Open Subtitles, testing whether the model can generate malicious responses. |
| Researcher Affiliation | Academia | Tianxing He & James Glass Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA, USA {tianxing,glass}@mit.edu |
| Pseudocode | Yes | Algorithm 1 Gibbs-enum algorithm |
| Open Source Code | Yes | The pytorch toolkit is used for all neural network related implementations, we publish all our code, data and trained model at https://github.mit.edu/tianxing/iclr2019_gibbsenum. |
| Open Datasets | Yes | Three publicly available conversational dialogue data-sets are used: Ubuntu, Switchboard, and Open Subtitles. The Ubuntu Dialogue Corpus (Lowe et al., 2015) consists of two-person conversations extracted from the Ubuntu chat logs... The Switchboard Dialogue Act Corpus 5 is a version of the Switchboard Telephone Speech Corpus... we also report experiments on the Open Subtitles data-set6 (Tiedemann, 2009). |
| Dataset Splits | No | The paper specifies training and testing data splits but does not explicitly mention a separate validation set split or methodology for it. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for experiments. |
| Software Dependencies | No | The paper mentions "The pytorch toolkit is used for all neural network related implementations" but does not specify a version number or other software dependencies with versions. |
| Experiment Setup | Yes | For all data-sets, we first train the LSTM based LM and seq2seq models with one hidden layer of size 600, and the embedding size is set to 300 7. For Switchboard a dropout layer with rate 0.3 is added because over-fitting is observed. The mini-batch size is set to 64 and we apply SGD training with a fixed starting learning rate (LR) for 10 iterations, and then another 10 iterations with LR halving. For Ubuntu and Switchboard, the starting LR is 1, while for Open Subtitles a starting LR of 0.1 is used. |