reproducibilityindex.ai

CoCo: Controllable Counterfactuals for Evaluating Dialogue State Trackers

Authors: SHIYANG LI, Semih Yavuz, Kazuma Hashimoto, Jia Li, Tong Niu, Nazneen Rajani, Xifeng Yan, Yingbo Zhou, Caiming Xiong

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluating state-of-the-art DST models on Multi WOZ dataset with COCO-generated counterfactuals results in a signiﬁcant performance drop of up to 30.8% (from 49.4% to 18.6%) in absolute joint goal accuracy.
Researcher Affiliation	Collaboration	Salesforce Research University of California, Santa Barbara
Pseudocode	No	The paper provides diagrams (e.g., Figure 1, Figure 2) to illustrate processes but does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/salesforce/coco-dst
Open Datasets	Yes	We train each of these three models following their publicly released implementations on the standard train/dev/test split of Multi WOZ 2.1 (Eric et al., 2019).
Dataset Splits	Yes	We train each of these three models following their publicly released implementations on the standard train/dev/test split of Multi WOZ 2.1 (Eric et al., 2019).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or cloud instance types) used for running the experiments.
Software Dependencies	No	The paper mentions using T5-small, BERT-base-uncased, and Adam optimizer, as well as PyTorch/Fairseq for NMT models, but does not provide specific version numbers for these software components.
Experiment Setup	Yes	During training, we use Adam optimizer (Kingma and Ba, 2015) with an initial learning rate 5e 5 and set linear warmup to be 200 steps. The batch size is set to 36 and training epoch is set to be 10. The maximum sequence length of both encoder and decoder is set to be 100.