Dialog Inpainting: Turning Documents into Dialogs
Authors: Zhuyun Dai, Arun Tejasvi Chaganty, Vincent Y Zhao, Aida Amini, Qazi Mamunur Rashid, Mike Green, Kelvin Guu
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | By applying this approach to passages from Wikipedia and the web, we produce Wiki Dialog and Web Dialog, two datasets totalling 19 million diverse information-seeking dialogs 1,000x larger than the largest existing Conv QA dataset. Furthermore, human raters judge the answer adequacy and conversationality of Wiki Dialog to be as good or better than existing manually-collected datasets. Remarkably, our approach shows strong zero-shot capability, generating high quality synthetic data without using any in-domain Conv QA data. Using our inpainted data to pre-train Conv QA retrieval systems, we significantly advance state-of-the-art across three benchmarks (QRe CC, OR-Qu AC, TREC CAs T) yielding up to 40% relative gains on standard evaluation metrics. |
| Researcher Affiliation | Industry | 1Google Inc., Mountain View, USA. |
| Pseudocode | No | The paper describes methods textually and mathematically but does not include pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper states: 'We released Wiki Dialog at https://github.com/ google-research/dialog-inpainting', but this link points to the generated dataset, not the source code for the dialog inpainting methodology itself. |
| Open Datasets | Yes | We use four open-domain conversational QA retrieval benchmarks: OR-Qu AC (Qu et al., 2020), TREC CAs T-19 (Byrne et al., 2019), TREC CAs T-20 (Dalton et al., 2020), and QRe CC (Anantha et al., 2021). |
| Dataset Splits | Yes | During fine-tuning, we separately train retrievers and rerankers on OR-Qu AC and QRe CC, using their validation sets to select checkpoints. Because CAs T19 and CAs T20 are extremely small datasets and do not include a training split, we do not fine-tune dual-enocoder retrievers on these datasets... We follow Yu et al. (2021) and use 5-fold cross-validation to finetune rerankers on CAs T19 and CAs T20: for each fold, we split the data into 5 splits based on dialogs, train a reranker on 3 splits of the data, select a checkpoint on one split and test on the remaining split. |
| Hardware Specification | Yes | Unless otherwise specified, all our dialog inpainters are initialized from T5-XXL (11B parameters)14 and finetuned using 64 TPU v3 chips 15 with constant learning rate 0.01, dropout rate 0.1 and batch size 128. |
| Software Dependencies | No | The paper mentions using T5 and implementing in JAX but does not specify version numbers for these or any other software libraries required for replication. |
| Experiment Setup | Yes | Unless otherwise specified, all our dialog inpainters are initialized from T5-XXL (11B parameters)14 and finetuned using 64 TPU v3 chips 15 with constant learning rate 0.01, dropout rate 0.1 and batch size 128. For pre-training on our inpainted datasets, we used a softmax temperature τ of 0.01, batch size 2048, and dropout rate 0.1. The models were trained with Adafactor optimizer with learning rate 1e 3 and 1k warm up steps. |