reproducibilityindex.ai

Towards Awareness of Human Relational Strategies in Virtual Agents

Authors: Ian Beaver, Cynthia Freeman, Abdullah Mueen2602-2610

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Human-computer data from three live customer service IVAs was collected, and annotators marked all text that was deemed unnecessary to the determination of user intention as well as the presence of multiple intents. We show that removal of this language from task-based inputs has a positive effect by both an increase in conﬁdence and improvement in responses, as evaluated by humans, demonstrating the need for IVAs to anticipate relational language injection.
Researcher Affiliation	Collaboration	Ian Beaver, Cynthia Freeman Verint Next IT Spokane Valley, WA USA {ian.beaver, cynthia.freeman}@verint.com Abdullah Mueen Department of Computer Science University of New Mexico, USA mueen@unm.edu
Pseudocode	No	No pseudocode or algorithm blocks are explicitly present in the paper. The methodology is described in narrative text.
Open Source Code	No	By providing this methodology and data1 to the community, we aim to contribute to the development of more relational and, therefore, more human-like IVAs and chatbots. 1http://s3-us-west-2.amazonaws.com/nextit-public/rsics.html
Open Datasets	Yes	Most importantly, we create the ﬁrst publicly available corpus with annotated relational segments. By providing this methodology and data1 to the community, we aim to contribute to the development of more relational and, therefore, more human-like IVAs and chatbots. 1http://s3-us-west-2.amazonaws.com/nextit-public/rsics.html
Dataset Splits	No	From our four datasets of 2,000 requests each, we formed two equally-sized partitions of 4,000 requests with 1,000 pulled from every dataset. Each partition was assigned to four annotators; thus, all 8,000 requests had exactly four independent annotations.
Hardware Specification	No	No specific hardware details (GPU/CPU models, memory, etc.) are mentioned for the experimental setup or analysis.
Software Dependencies	No	No specific software dependencies with version numbers are mentioned for the experimental setup or analysis.
Experiment Setup	Yes	To measure the effect of relational language on IVA performance and determine what level of annotator agreement is acceptable, we ﬁrst constructed highlights for the 6,759 requests using all four levels of annotator agreement. Next, four cleaned requests were generated from each original request by removing the highlighted portion for each threshold of annotator agreement resulting in 27,036 requests with various amounts of relational language removed. Every unaltered request was fed through its originating IVA, and the intent conﬁdence score and response was recorded. We then fed each of the four cleaned versions to the IVA and recorded the conﬁdence and response. An A-B test was conducted where four annotators were shown the user s original request along with the IVA response from the original request and the IVA response from a cleaned request. They were asked to determine which, if any, response they believed better addressed the original request.