Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Topic-Aware Multi-turn Dialogue Modeling
Authors: Yi Xu, Hai Zhao, Zhuosheng Zhang14176-14184
AAAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on three public datasets show TADAM can outperform the state-of-the-art method, especially by 3.3% on E-commerce dataset that has an obvious topic shift. |
| Researcher Affiliation | Academia | 1 Department of Computer Science and Engineering, Shanghai Jiao Tong University 2 Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, China 3 Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University |
| Pseudocode | Yes | Algorithm 1 Topic-aware Segmentation Algorithm |
| Open Source Code | Yes | Both datasets and code are available at https://github.com/xyease/TADAM |
| Open Datasets | Yes | For Chinese, we annotate a dataset including 505 phone records of customer service on banking consultation. For English, we build dataset including 711 dialogues by joining dialogues from existing multi-turn dialogue datasets: Multi WOZ Corpus2 (Budzianowski et al. 2018) and Stanford Dialog Dataset (Eric et al. 2017). (Footnote 2: https://doi.org/10.17863/CAM.41572) Additionally, Ubuntu Corpus (Lowe et al. 2015), Douban Corpus (Wu et al. 2017), and E-commerce Corpus (Zhang et al. 2018) were used. |
| Dataset Splits | No | The paper mentions training on datasets and evaluating on test sets but does not explicitly provide details about a separate validation dataset split with percentages or counts. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using "pre-trained BERT (Devlin et al. 2018)" and links to "https://github.com/huggingface/transformers", implying the use of the HuggingFace Transformers library, but does not provide specific version numbers for Python, PyTorch, TensorFlow, or the Transformers library itself. |
| Experiment Setup | Yes | For topic-aware segmentation: In both datasets, we set range R = 8, jump step K = 2 (value of 1 will lead to fragmentation), window size d = 2 and threshold θcost=0.6. For response selection: We apply topic-aware segmentation algorithm to Ubuntu, Douban and E-commerce with range R = 2, 2, 6... the max input sequence length is set to 350... max number of segments is 10. We set the learning rate as 2e-5 using Bert Adam with a warmup proportion of 10%. Our model is trained with batch size of {20,32,20} and epoch of {3,3,4}... the α of word-level weights is set to 0.5. |