Heterogeneous-Branch Collaborative Learning for Dialogue Generation
Authors: Yiwei Li, Shaoxiong Feng, Bin Sun, Kan Li
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed approach significantly improves branch heterogeneity and outperforms state-of-the-art collaborative learning methods on two widely used open-domain dialogue datasets.Extensive Evaluations on two widely used open-domain dialogue datasets demonstrate that the proposed approach significantly improves the branch heterogeneity and outperforms the state-of-the-art collaborative learning methods. |
| Researcher Affiliation | Academia | Yiwei Li, Shaoxiong Feng, Bin Sun, Kan Li* School of Computer Science, Beijing Institute of Technology {liyiwei,shaoxiongfeng,binsun,likan}@bit.edu.cn |
| Pseudocode | No | The paper presents mathematical formulas and figures (e.g., Figure 2, Figure 3), but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | No | The paper does not include any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We evaluate the proposed method using two widely used dialogue datasets: Daily Dialog, a collection of conversations that represent human daily communication (Li et al. 2017), and Open Subtitles, which consists of large-scale dialogues extracted from movie subtitles (Tiedemann 2009). |
| Dataset Splits | Yes | After data preprocessing, the number of context-response pairs in training/validation/test set is 68,066/6,820/6,841 for Daily Dialog, and 200,000/20,000/10,000 for Open Subtitles. |
| Hardware Specification | Yes | We implement all approaches with Pytorch 1.11, and conduct all experiments on NVIDIA TITAN RTX. |
| Software Dependencies | Yes | We implement all approaches with Pytorch 1.11, and conduct all experiments on NVIDIA TITAN RTX. |
| Experiment Setup | Yes | Each branch is built on the lightweight model architecture (Small Transformer): the encoder and decoder contain only 2 layers, in which the self-attention module has 4 attention heads and 1024 feed-forward units. The size of hidden states is set to 256. Dropout (Srivastava et al. 2014) is used for the selfattention module, the feed-forward layer, and the activation layer, and the rate of all three is set to 0.1. The batch size is set to 64. The selection ratio for attribute-specific subset is 70%. For the temperature coefficient t, we simply set it to 1. Beam search with a size of 5 is used for decoding. |