reproducibilityindex.ai

Detecting Egregious Responses in Neural Sequence-to-sequence Models

Authors: Tianxing He, James Glass

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We adopt an empirical methodology, in which we ﬁrst create lists of egregious output sequences, and then design a discrete optimization algorithm to ﬁnd input sequences that will cause the model to generate them. Moreover, the optimization algorithm is enhanced for large vocabulary search and constrained to search for input sequences that are likely to be input by real-world users. In our experiments, we apply this approach to dialogue response generation models trained on three real-world dialogue data-sets: Ubuntu, Switchboard and Open Subtitles, testing whether the model can generate malicious responses.
Researcher Affiliation	Academia	Tianxing He & James Glass Computer Science and Artiﬁcial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA, USA {tianxing,glass}@mit.edu
Pseudocode	Yes	Algorithm 1 Gibbs-enum algorithm
Open Source Code	Yes	The pytorch toolkit is used for all neural network related implementations, we publish all our code, data and trained model at https://github.mit.edu/tianxing/iclr2019_gibbsenum.
Open Datasets	Yes	Three publicly available conversational dialogue data-sets are used: Ubuntu, Switchboard, and Open Subtitles. The Ubuntu Dialogue Corpus (Lowe et al., 2015) consists of two-person conversations extracted from the Ubuntu chat logs... The Switchboard Dialogue Act Corpus 5 is a version of the Switchboard Telephone Speech Corpus... we also report experiments on the Open Subtitles data-set6 (Tiedemann, 2009).
Dataset Splits	No	The paper specifies training and testing data splits but does not explicitly mention a separate validation set split or methodology for it.
Hardware Specification	No	The paper does not explicitly describe the hardware used for experiments.
Software Dependencies	No	The paper mentions "The pytorch toolkit is used for all neural network related implementations" but does not specify a version number or other software dependencies with versions.
Experiment Setup	Yes	For all data-sets, we ﬁrst train the LSTM based LM and seq2seq models with one hidden layer of size 600, and the embedding size is set to 300 7. For Switchboard a dropout layer with rate 0.3 is added because over-ﬁtting is observed. The mini-batch size is set to 64 and we apply SGD training with a ﬁxed starting learning rate (LR) for 10 iterations, and then another 10 iterations with LR halving. For Ubuntu and Switchboard, the starting LR is 1, while for Open Subtitles a starting LR of 0.1 is used.