reproducibilityindex.ai

ALOHA: Artificial Learning of Human Attributes for Dialogue Agents

Authors: Aaron W. Li, Veronica Jiang, Steven Y. Feng, Julia Sprague, Wei Zhou, Jesse Hoey8155-8163

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our preliminary experiments demonstrate that two variations of ALOHA, combined with our proposed dataset, can outperform baseline models at identifying the correct dialogue responses of chosen target characters, and are stable regardless of the character s identity, the genre of the show, and the context of the dialogue.
Researcher Affiliation	Collaboration	1David R. Cheriton School of Computer Science University of Waterloo Waterloo, Ontario, Canada 2Huawei Technologies Co., Ltd. {w89li, r4jiang, sy2feng, jsprague, jhoey}@uwaterloo.ca, wei.zhou1@huawei.com
Pseudocode	No	The paper describes the system architecture and its components (CSM, CCM, LSRM) in detail, but it does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	We release all of ALOHA’s data and code along with additional information for reproduction.1 1https://github.com/newpro/aloha-chatbot
Open Datasets	Yes	By combining detailed HLA data with dialogue data for speciﬁc characters, we present a dataset, HLA-Chat, that models character proﬁles and gives dialogue agents the ability to learn characters language styles through their HLAs. We release all of ALOHA’s data and code along with additional information for reproduction.1
Dataset Splits	Yes	Five-Fold Cross Validation is used for the training and testing of the Uniform Model and two LSRM variations. The folds are divided randomly by the TV shows in our dialogue data. We use the dialogue data for 80% of these shows as the four-folds for training, and the dialogue data for the remaining 20% as the fifth-fold for testing. To do so, during training, 30% of the character-HLA pairs (which are either 0 or 1) are masked, and this is used as a validation set (see Figure 4).
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using BERT, Poly-encoder, fastText embeddings, SGD, and Adam optimizers, but it does not specify any version numbers for these software components or libraries (e.g., PyTorch 1.x, TensorFlow 2.x).
Experiment Setup	Yes	We cap the length of the OBS at 360 tokens and the length of each candidate response at 72 tokens. We use a batch size of 80, learning rate of 5e-5, and perform warm-up updates for 1000 iterations. The learning rate scheduler uses SGD optimizer with Nesterov s accelerated gradient descent (Sutskever et al. 2013) and is set to have a decay of 0.4 and to reduce on plateau. We initialize using pretrained fast Text (Bojanowski et al. 2017) embeddings. Other than using a smaller batch size of 80, we adapt all parameters used in Humeau et al. (2019): Adam optimizer with learning rate of 2e-4, β1 = 0.9, β2 = 0.98, no L2 weight decay, linear learning rate warmup, and inverse square root decay of the learning rate.