SGD-X: A Benchmark for Robust Generalization in Schema-Guided Dialogue Systems

Authors: Harrison Lee, Raghav Gupta, Abhinav Rastogi, Yuan Cao, Bin Zhang, Yonghui Wu10938-10946

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We explore the robustness of dialogue systems to linguistic variations in schemas by designing SGD-X a benchmark extending SGD with semantically similar yet stylistically diverse variants for every schema. We observe that two top state tracking models fail to generalize well across schema variants, measured by joint goal accuracy and a novel metric for measuring schema sensitivity. Additionally, we present a simple model-agnostic data augmentation method to improve schema robustness.
Researcher Affiliation Industry Google Research {harrisonlee,raghavgupta,abhirast,yuancao,zbin,yonghui}@google.com
Pseudocode No The paper does not contain pseudocode or algorithm blocks. It references figures from external papers that might contain such elements but does not provide its own.
Open Source Code Yes We release SGD-X and an evaluation script for schema-guided dialogue state tracking models on Git Hub at https://github.com/google-research-datasets/dstc8-schema-guideddialogue
Open Datasets Yes The Schema-Guided Dialogue (SGD) dataset introduced a paradigm for enabling models to support any service in zero-shot through schemas, which describe service APIs to models in natural language.
Dataset Splits No The paper states that models are trained on the original SGD training set and evaluated on SGD-X, but it does not explicitly provide specific train/validation/test split percentages or sample counts within the paper.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies No The paper mentions using "spa Cy (Honnibal et al. 2020)" and models like "BERT-Base" and "T5-Base" but does not specify version numbers for any software libraries or dependencies used in their implementation.
Experiment Setup No The paper states "More training details in the Appendix, available in the Ar Xiv version2 of this paper." but these details are not provided in the main text of the paper itself.