CoCo: Controllable Counterfactuals for Evaluating Dialogue State Trackers
Authors: SHIYANG LI, Semih Yavuz, Kazuma Hashimoto, Jia Li, Tong Niu, Nazneen Rajani, Xifeng Yan, Yingbo Zhou, Caiming Xiong
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluating state-of-the-art DST models on Multi WOZ dataset with COCO-generated counterfactuals results in a significant performance drop of up to 30.8% (from 49.4% to 18.6%) in absolute joint goal accuracy. |
| Researcher Affiliation | Collaboration | Salesforce Research University of California, Santa Barbara |
| Pseudocode | No | The paper provides diagrams (e.g., Figure 1, Figure 2) to illustrate processes but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/salesforce/coco-dst |
| Open Datasets | Yes | We train each of these three models following their publicly released implementations on the standard train/dev/test split of Multi WOZ 2.1 (Eric et al., 2019). |
| Dataset Splits | Yes | We train each of these three models following their publicly released implementations on the standard train/dev/test split of Multi WOZ 2.1 (Eric et al., 2019). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions using T5-small, BERT-base-uncased, and Adam optimizer, as well as PyTorch/Fairseq for NMT models, but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | During training, we use Adam optimizer (Kingma and Ba, 2015) with an initial learning rate 5e 5 and set linear warmup to be 200 steps. The batch size is set to 36 and training epoch is set to be 10. The maximum sequence length of both encoder and decoder is set to be 100. |