Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Progressive Open-Domain Response Generation with Multiple Controllable Attributes

Authors: Haiqin Yang, Xiaoyuan Yao, Yiqun Duan, Jianping Shen, Jie Zhong, Kun Zhang

IJCAI 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we conduct extensive evaluations to show that PHED significantly outperforms the state-of-the-art neural generation models and produces more diverse responses as expected. The contribution of our work is threefold: (3) empirical evaluations clearly demonstrating the effectiveness of PHED.
Researcher Affiliation Collaboration Haiqin Yang1 , Xiaoyuan Yao1 , Yiqun Duan1 , Jianping Shen1 , Jie Zhong1 and Kun Zhang2 1Ping An Life Insurance Company of China 2Carnegie Mellon University
Pseudocode No The paper does not include a section explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured pseudocode blocks.
Open Source Code Yes Our implementation is in Py Torch 1. 1https://www.dropbox.com/s/1376kmhvuaxqe5h/PHED.zip?dl=0
Open Datasets Yes The data is the short-text conversation dataset (STC) [Shang et al., 2015], collected from Sina Weibo, a Chinese social platform.
Dataset Splits Yes After setting the maximum number of characters in a response to 30, we obtain around 3.9 million dialog pairs and split them into the set of training, validation, and test with the ratio of 90%, 5%, and 5%, respectively.
Hardware Specification Yes Under the above settings, we train PHED 10 epochs at each stage on a Tesla V100 GPU and cost about 51 hours.
Software Dependencies No The paper mentions 'Py Torch' as the implementation framework but does not specify a version number or other software dependencies with their versions.
Experiment Setup Yes For each Transformer block, we set the number of self-attention heads to 8 and the hidden size (H) to 512. [...] trained by ADAM with the learning rate 0.0001 and the batch size of 32. In the inference, we set the beam search size to 5.