ALOHA: Artificial Learning of Human Attributes for Dialogue Agents
Authors: Aaron W. Li, Veronica Jiang, Steven Y. Feng, Julia Sprague, Wei Zhou, Jesse Hoey8155-8163
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our preliminary experiments demonstrate that two variations of ALOHA, combined with our proposed dataset, can outperform baseline models at identifying the correct dialogue responses of chosen target characters, and are stable regardless of the character s identity, the genre of the show, and the context of the dialogue. |
| Researcher Affiliation | Collaboration | 1David R. Cheriton School of Computer Science University of Waterloo Waterloo, Ontario, Canada 2Huawei Technologies Co., Ltd. {w89li, r4jiang, sy2feng, jsprague, jhoey}@uwaterloo.ca, wei.zhou1@huawei.com |
| Pseudocode | No | The paper describes the system architecture and its components (CSM, CCM, LSRM) in detail, but it does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release all of ALOHA’s data and code along with additional information for reproduction.1 1https://github.com/newpro/aloha-chatbot |
| Open Datasets | Yes | By combining detailed HLA data with dialogue data for specific characters, we present a dataset, HLA-Chat, that models character profiles and gives dialogue agents the ability to learn characters language styles through their HLAs. We release all of ALOHA’s data and code along with additional information for reproduction.1 |
| Dataset Splits | Yes | Five-Fold Cross Validation is used for the training and testing of the Uniform Model and two LSRM variations. The folds are divided randomly by the TV shows in our dialogue data. We use the dialogue data for 80% of these shows as the four-folds for training, and the dialogue data for the remaining 20% as the fifth-fold for testing. To do so, during training, 30% of the character-HLA pairs (which are either 0 or 1) are masked, and this is used as a validation set (see Figure 4). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using BERT, Poly-encoder, fastText embeddings, SGD, and Adam optimizers, but it does not specify any version numbers for these software components or libraries (e.g., PyTorch 1.x, TensorFlow 2.x). |
| Experiment Setup | Yes | We cap the length of the OBS at 360 tokens and the length of each candidate response at 72 tokens. We use a batch size of 80, learning rate of 5e-5, and perform warm-up updates for 1000 iterations. The learning rate scheduler uses SGD optimizer with Nesterov s accelerated gradient descent (Sutskever et al. 2013) and is set to have a decay of 0.4 and to reduce on plateau. We initialize using pretrained fast Text (Bojanowski et al. 2017) embeddings. Other than using a smaller batch size of 80, we adapt all parameters used in Humeau et al. (2019): Adam optimizer with learning rate of 2e-4, β1 = 0.9, β2 = 0.98, no L2 weight decay, linear learning rate warmup, and inverse square root decay of the learning rate. |