reproducibilityindex.ai

Stylized Dialogue Response Generation Using Stylized Unpaired Texts

Authors: Yinhe Zheng, Zikai Chen, Rongsheng Zhang, Shilei Huang, Xiaoxi Mao, Minlie Huang14558-14567

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Automatic and manual evaluations on two datasets demonstrate that our method outperforms competitive baselines in producing coherent and style-intensive dialogue responses.
Researcher Affiliation	Collaboration	1 Department of Computer Science and Technology, Institute for Artiﬁcal Intelligence, State Key Lab of Intelligent Technology and Systems, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China. 2 Samsung Research China Beijing (SRC-B), Beijing, China. 3 Fuxi AI Lab, Net Ease Inc., Hangzhou, China.
Pseudocode	Yes	Algorithm 1 Joint training process Input: M unpaired texts: Ds={ti}M i=1 in style S1, N dialogue pairs Dp={ xi, yi }N i=1 in style S0. Output: Stylized dialogue model 1: Init the stylized and inverse dialogue model e, d, ˆe, ˆd 2: while not converge do 3: Sample nd dialogue pairs Db p = { xi, yi }nd i=1 Dp 4: Train e and d by optimizing Lp2r (Eq. 6) on Db p 5: Train ˆe and ˆd by optimizing Lr2p (Eq. 7) on Db p 6: if Current Step > Nf then 7: Dpp empty set. 8: Sample ns stylized texts Db s = {ti}ns i=1 Ds 9: for each ti Db s do 10: Decode m posts {x ij}m j=1 from p ˆd(x\|ˆe(ti)) 11: Dpp Dpp S{ x ij, ti }m j=1 12: end for 13: Train e and d by optimizing Linv (Eq. 8) on Dpp 14: end if 15: end while
Open Source Code	No	The WDJN dataset will be released for public use.
Open Datasets	Yes	We collected 300K Weibo Dialogues (style S0) as Dp and sampled 95.1K stylized unpaired texts that are wrapped in quotation marks in Jinyong s Novels (style S1) as Ds. The WDJN dataset will be released for public use. TCFC (Wu, Wang, and Liu 2020): This dataset focuses on the formality in English writing. We sampled 217.2K informal dialogue pairs (style S0) as Dp and 500.0K formal texts (style S1) as Ds from the original dataset, and used the test data in the original dataset as our test set Dt, which contains 1,956 manually-crafted dialogue pairs (978 informal pairs and 978 formal pairs).
Dataset Splits	No	The paper describes training and test sets in Table 1 and in the text, but does not explicitly mention or quantify a validation set split for hyperparameter tuning or early stopping.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions using "pre-trained CDial-GPT" and "Dialo GPT" models, but does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The top-K sampling process in Algorithm 1 employs a K = 20 and beam size of 4 (WDJN) or 2 (TCFC). The value of Nf is set to 300. The training of our model stops after 10 iteration epochs on Dp (WDJN) or after 8,000 steps of updates (TCFC).