Towards Awareness of Human Relational Strategies in Virtual Agents

Authors: Ian Beaver, Cynthia Freeman, Abdullah Mueen2602-2610

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Human-computer data from three live customer service IVAs was collected, and annotators marked all text that was deemed unnecessary to the determination of user intention as well as the presence of multiple intents. We show that removal of this language from task-based inputs has a positive effect by both an increase in confidence and improvement in responses, as evaluated by humans, demonstrating the need for IVAs to anticipate relational language injection.
Researcher Affiliation Collaboration Ian Beaver, Cynthia Freeman Verint Next IT Spokane Valley, WA USA {ian.beaver, cynthia.freeman}@verint.com Abdullah Mueen Department of Computer Science University of New Mexico, USA mueen@unm.edu
Pseudocode No No pseudocode or algorithm blocks are explicitly present in the paper. The methodology is described in narrative text.
Open Source Code No By providing this methodology and data1 to the community, we aim to contribute to the development of more relational and, therefore, more human-like IVAs and chatbots. 1http://s3-us-west-2.amazonaws.com/nextit-public/rsics.html
Open Datasets Yes Most importantly, we create the first publicly available corpus with annotated relational segments. By providing this methodology and data1 to the community, we aim to contribute to the development of more relational and, therefore, more human-like IVAs and chatbots. 1http://s3-us-west-2.amazonaws.com/nextit-public/rsics.html
Dataset Splits No From our four datasets of 2,000 requests each, we formed two equally-sized partitions of 4,000 requests with 1,000 pulled from every dataset. Each partition was assigned to four annotators; thus, all 8,000 requests had exactly four independent annotations.
Hardware Specification No No specific hardware details (GPU/CPU models, memory, etc.) are mentioned for the experimental setup or analysis.
Software Dependencies No No specific software dependencies with version numbers are mentioned for the experimental setup or analysis.
Experiment Setup Yes To measure the effect of relational language on IVA performance and determine what level of annotator agreement is acceptable, we first constructed highlights for the 6,759 requests using all four levels of annotator agreement. Next, four cleaned requests were generated from each original request by removing the highlighted portion for each threshold of annotator agreement resulting in 27,036 requests with various amounts of relational language removed. Every unaltered request was fed through its originating IVA, and the intent confidence score and response was recorded. We then fed each of the four cleaned versions to the IVA and recorded the confidence and response. An A-B test was conducted where four annotators were shown the user s original request along with the IVA response from the original request and the IVA response from a cleaned request. They were asked to determine which, if any, response they believed better addressed the original request.