Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Heterogeneous-Branch Collaborative Learning for Dialogue Generation
Authors: Yiwei Li, Shaoxiong Feng, Bin Sun, Kan Li
AAAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed approach significantly improves branch heterogeneity and outperforms state-of-the-art collaborative learning methods on two widely used open-domain dialogue datasets.Extensive Evaluations on two widely used open-domain dialogue datasets demonstrate that the proposed approach significantly improves the branch heterogeneity and outperforms the state-of-the-art collaborative learning methods. |
| Researcher Affiliation | Academia | Yiwei Li, Shaoxiong Feng, Bin Sun, Kan Li* School of Computer Science, Beijing Institute of Technology EMAIL |
| Pseudocode | No | The paper presents mathematical formulas and figures (e.g., Figure 2, Figure 3), but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | No | The paper does not include any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We evaluate the proposed method using two widely used dialogue datasets: Daily Dialog, a collection of conversations that represent human daily communication (Li et al. 2017), and Open Subtitles, which consists of large-scale dialogues extracted from movie subtitles (Tiedemann 2009). |
| Dataset Splits | Yes | After data preprocessing, the number of context-response pairs in training/validation/test set is 68,066/6,820/6,841 for Daily Dialog, and 200,000/20,000/10,000 for Open Subtitles. |
| Hardware Specification | Yes | We implement all approaches with Pytorch 1.11, and conduct all experiments on NVIDIA TITAN RTX. |
| Software Dependencies | Yes | We implement all approaches with Pytorch 1.11, and conduct all experiments on NVIDIA TITAN RTX. |
| Experiment Setup | Yes | Each branch is built on the lightweight model architecture (Small Transformer): the encoder and decoder contain only 2 layers, in which the self-attention module has 4 attention heads and 1024 feed-forward units. The size of hidden states is set to 256. Dropout (Srivastava et al. 2014) is used for the selfattention module, the feed-forward layer, and the activation layer, and the rate of all three is set to 0.1. The batch size is set to 64. The selection ratio for attribute-specific subset is 70%. For the temperature coefficient t, we simply set it to 1. Beam search with a size of 5 is used for decoding. |