Heterogeneous-Branch Collaborative Learning for Dialogue Generation

Authors: Yiwei Li, Shaoxiong Feng, Bin Sun, Kan Li

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The proposed approach significantly improves branch heterogeneity and outperforms state-of-the-art collaborative learning methods on two widely used open-domain dialogue datasets.Extensive Evaluations on two widely used open-domain dialogue datasets demonstrate that the proposed approach significantly improves the branch heterogeneity and outperforms the state-of-the-art collaborative learning methods.
Researcher Affiliation Academia Yiwei Li, Shaoxiong Feng, Bin Sun, Kan Li* School of Computer Science, Beijing Institute of Technology {liyiwei,shaoxiongfeng,binsun,likan}@bit.edu.cn
Pseudocode No The paper presents mathematical formulas and figures (e.g., Figure 2, Figure 3), but no explicit pseudocode or algorithm blocks are provided.
Open Source Code No The paper does not include any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We evaluate the proposed method using two widely used dialogue datasets: Daily Dialog, a collection of conversations that represent human daily communication (Li et al. 2017), and Open Subtitles, which consists of large-scale dialogues extracted from movie subtitles (Tiedemann 2009).
Dataset Splits Yes After data preprocessing, the number of context-response pairs in training/validation/test set is 68,066/6,820/6,841 for Daily Dialog, and 200,000/20,000/10,000 for Open Subtitles.
Hardware Specification Yes We implement all approaches with Pytorch 1.11, and conduct all experiments on NVIDIA TITAN RTX.
Software Dependencies Yes We implement all approaches with Pytorch 1.11, and conduct all experiments on NVIDIA TITAN RTX.
Experiment Setup Yes Each branch is built on the lightweight model architecture (Small Transformer): the encoder and decoder contain only 2 layers, in which the self-attention module has 4 attention heads and 1024 feed-forward units. The size of hidden states is set to 256. Dropout (Srivastava et al. 2014) is used for the selfattention module, the feed-forward layer, and the activation layer, and the rate of all three is set to 0.1. The batch size is set to 64. The selection ratio for attribute-specific subset is 70%. For the temperature coefficient t, we simply set it to 1. Beam search with a size of 5 is used for decoding.