reproducibilityindex.ai

Response Enhanced Semi-supervised Dialogue Query Generation

Authors: Jianheng Huang, Ante Wang, Linfeng Gao, Linfeng Song, Jinsong Su

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results and in-depth analysis of three benchmarks show the effectiveness of our framework in cross-domain and low-resource scenarios. Particularly, Semi DQG significantly surpasses Chat GPT and competitive baselines.
Researcher Affiliation	Collaboration	Jianheng Huang1,2,3, Ante Wang1,2,3, Linfeng Gao1,3, Linfeng Song4, Jinsong Su1,2,3 1School of Informatics, Xiamen University, China 2Shanghai Artificial Intelligence Laboratory, China 3Key Laboratory of Digital Protection and Intelligent Processing of Intangible Cultural Heritage of Fujian and Taiwan (Xiamen University), Ministry of Culture and Tourism, China 4Tencent AI Lab
Pseudocode	No	The paper includes a diagram (Figure 1) illustrating the framework's procedure but no explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/ Deep Learn XMU/Semi DQG.
Open Datasets	Yes	We conduct experiments in both cross-domain and low-resource scenarios across three benchmarks. In the cross-domain scenario, we explore Wizard-of-Internet (Wo I, Komeili, Shuster, and Weston 2022) Wizard-of-Wikipedia (Wo W, Dinan et al. 2018) in English, and Du Sinc (Zhou et al. 2022) Kd Conv (Zhou et al. 2020) in Chinese. For Du Sinc, the paper explicitly states: 'We use its publicly available part4 for experiments.' with footnote 4 providing 'https://aistudio.baidu.com/aistudio/datasetdetail/139431'.
Dataset Splits	No	The paper mentions 'development sets' and 'low-resource scenarios' with sample numbers like 300, 500, 1k, 3k instances for training, but does not explicitly state the specific split percentages (e.g., 80/10/10) or exact sample counts for training, validation, and test sets for all datasets used to allow for direct reproduction of data partitioning.
Hardware Specification	No	The paper mentions using T5-base models from Huggingface but does not specify any hardware details like GPU models, CPU types, or memory used for training or inference.
Software Dependencies	No	The paper mentions using 'T5-base' and 'Langboat/mengzi-t5-base' checkpoints from Huggingface, and 'Adam optimizer', but it does not specify version numbers for programming languages (e.g., Python), libraries (e.g., PyTorch, TensorFlow, Huggingface Transformers), or CUDA.
Experiment Setup	Yes	During training, we apply an Adam optimizer, with a linear scheduler and an initial learning rate of 3e-5. We use a batch size of 64 for cross-domain experiments and 16 for low-resource counterparts. For the main experiments, we set N = 1 for query selection, and use Unigram F1 as the default Fsim. The selection of hyperparameter α for Wo W/Wo I/Kd Conv is 1.0/1.0/0.5, respectively. We set Nc = 10 for rank-based reward in the cross-domain scenario and Nc = 3 for other settings.