DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization
Authors: Ming Zhong, Yang Liu, Yichong Xu, Chenguang Zhu, Michael Zeng11765-11773
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on five datasets of long dialogues, covering tasks of dialogue summarization, abstractive question answering and topic segmentation. Experimentally, we show that our pre-trained model DIALOGLM significantly surpasses the state-of-the-art models across datasets and tasks. |
| Researcher Affiliation | Collaboration | Ming Zhong*1, University of Illinois at Urbana-Champaign 2Microsoft Cognitive Services Research Group mingz5@illinois.edu, {yaliu10, yichong.xu, chezhu, nzeng}@microsoft.com |
| Pseudocode | No | The paper includes diagrams and describes the steps of its method in prose, but there are no explicitly labeled pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | Source code and all the pretrained models are available on our Git Hub repository (https: //github.com/microsoft/Dialog LM). |
| Open Datasets | Yes | Pretraining data is the combination of Media Sum dataset (Zhu et al. 2021) and Open Subtitles Corpus (Lison and Tiedemann 2016) (see Table 2). |
| Dataset Splits | No | The paper mentions using well-known datasets like AMI, ICSI, QMSum, Forever Dreaming, and TVMega Site, and refers to 'test set' for evaluation. However, it does not provide specific details on training, validation, and test splits (e.g., percentages or sample counts) for any of these datasets. |
| Hardware Specification | Yes | 8 A100 GPUs with 40GB memory are used to complete the experiments in this paper. |
| Software Dependencies | No | The paper mentions the use of models like UNILM and Transformer, but it does not specify any software libraries, frameworks, or programming languages with their version numbers that were used for the experiments. |
| Experiment Setup | Yes | To pre-train DIALOGLM, we further train UNILM with the window-based denoising framework for total 200,000 steps on dialogue data, of which 20,000 are warmup steps. We set batch size to 64 and the maximum learning rate to 2e-5. |