Unsupervised Abstractive Dialogue Summarization for Tete-a-Tetes
Authors: Xinyuan Zhang, Ruiyi Zhang, Manzil Zaheer, Amr Ahmed14489-14497
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that Su Ta T is superior on unsupervised dialogue summarization for both automatic and human evaluations, and is capable of dialogue classification and single-turn conversation generation. |
| Researcher Affiliation | Collaboration | Xinyuan Zhang,1 Ruiyi Zhang, 2 Manzil Zaheer, 3 Amr Ahmed 3 1 ASAPP 2 Duke University 3 Google Research |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states that 'These reference summaries cover all domains in both datasets and will be released later.', referring to reference summaries, but does not provide an explicit statement or link for the release of the model's source code. |
| Open Datasets | Yes | The experiments are conducted on two dialogue datasets: Multi WOZ-2.0 (Budzianowski et al. 2018) and Taskmaster1 (Byrne et al. 2019). |
| Dataset Splits | Yes | In the experiment, we split the dataset into 8438, 1000, and 1000 dialogues for training, testing, and validation. Taskmaster consists of 7708 written dialogues... The dataset is split into 6168, 770, and 770 dialogues for training, testing, and validation. |
| Hardware Specification | Yes | Su Ta T is implemented in pytorch and trained using a NVIDIA Tesla V100 GPU with 16GB. |
| Software Dependencies | No | The paper mentions 'pytorch' but does not specify a version number or other software dependencies with versions. |
| Experiment Setup | Yes | For KL annealing, the initial weights of the KL terms are 0, and then we gradually increase the weights as training progresses, until they reach the KL threshold of 0.8; the rate of this increase is set to 0.5 with respect to the total number of batches. The word dropout rate during decoding is 0.4. The latent variable size is 300 for both customer and agent latent variables. α that controls weights of two objective functions in Equation 4 is set to 0.4. The word embedding size is 300. For the bidirectional LSTM encoder and LSTM decoder, the number of hidden layers is 1 and the hidden unit size is 600. For the Transformer encoder and decoder, the number of hidden layers is 1 and the number of heads in the multi-head attention is set to 10. The number of heads in the sentence-level self-attention is also 10. The hidden unit size of the MLPs in p(zy|zx) is 600. The annealing parameter τ for soft-argmax in Equation 5 is set to 0.01. During training, the learning rate is 0.0005, the batch size is 16, and the maximum number of epoch is 10. |