Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models

Authors: Bing Liu, Tong Yu, Ian Lane, Ole Mengshoel

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the Ubuntu Dialogue Corpus demonstrate significant performance gains of the proposed method over conventional linear contextual bandits. Moreover, we report encouraging response selection performance of the proposed neural bandit model using the Recall@k metric for a small set of online training samples.
Researcher Affiliation Academia Electrical and Computer Engineering, Carnegie Mellon University {liubing, lane}@cmu.edu, tongy1@andrew.cmu.edu, ole.mengshoel@sv.cmu.edu
Pseudocode No The paper describes the model details and methods in prose and equations, but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about releasing its source code or a link to a code repository for the methodology described.
Open Datasets Yes We use the Ubuntu Dialogue Corpus (UDC) (Lowe et al. 2015) in our evaluation.
Dataset Splits Yes In model training, the bidirectional LSTM encoder is trained using data from the original UDC training set. Online bandit learning and evaluation is performed using data sampled from the UDC test set. ... We split our 1000 learning samples into two parts: 800 samples are used in the online learning by bandits, and the rest 200 samples are set aside for evaluation.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies No No specific software dependencies with version numbers (e.g., programming language, libraries, or frameworks) are explicitly mentioned in the paper.
Experiment Setup Yes LSTM state size and output size are both set as 128. Word embeddings of size 150 are randomly initialized and fine-tuned during mini-batch (size 128) training. We use Adam optimizer (Kingma and Ba 2014) in the neural network offline model training with initial learning rate of 1e-3. Dropout (Srivastava et al. 2014) with keep probability of 0.5 is applied during offline supervised pre-training.