Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models
Authors: Bing Liu, Tong Yu, Ian Lane, Ole Mengshoel
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on the Ubuntu Dialogue Corpus demonstrate significant performance gains of the proposed method over conventional linear contextual bandits. Moreover, we report encouraging response selection performance of the proposed neural bandit model using the Recall@k metric for a small set of online training samples. |
| Researcher Affiliation | Academia | Electrical and Computer Engineering, Carnegie Mellon University {liubing, lane}@cmu.edu, tongy1@andrew.cmu.edu, ole.mengshoel@sv.cmu.edu |
| Pseudocode | No | The paper describes the model details and methods in prose and equations, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | We use the Ubuntu Dialogue Corpus (UDC) (Lowe et al. 2015) in our evaluation. |
| Dataset Splits | Yes | In model training, the bidirectional LSTM encoder is trained using data from the original UDC training set. Online bandit learning and evaluation is performed using data sampled from the UDC test set. ... We split our 1000 learning samples into two parts: 800 samples are used in the online learning by bandits, and the rest 200 samples are set aside for evaluation. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running its experiments. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., programming language, libraries, or frameworks) are explicitly mentioned in the paper. |
| Experiment Setup | Yes | LSTM state size and output size are both set as 128. Word embeddings of size 150 are randomly initialized and fine-tuned during mini-batch (size 128) training. We use Adam optimizer (Kingma and Ba 2014) in the neural network offline model training with initial learning rate of 1e-3. Dropout (Srivastava et al. 2014) with keep probability of 0.5 is applied during offline supervised pre-training. |