reproducibilityindex.ai

Contrastive Learning Reduces Hallucination in Conversations

Authors: Weiwei Sun, Zhengliang Shi, Shen Gao, Pengjie Ren, Maarten de Rijke, Zhaochun Ren

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on the Wizard of Wikipedia, a public, open-domain knowledgegrounded dialogue benchmark, and assess the effectiveness of Mix CL. Mix CL effectively reduces the hallucination of LMs in conversations and achieves the highest performance among LM-based dialogue agents in terms of relevancy and factuality. ... Our contributions are as follows: ... (iv) Experiments on the Wizard-of-Wikipedia dataset show that Mix CL effectively reduces the hallucinating content produced by the LM and achieves comparable performance to KB-based approaches. ... 6 Experimental Setup ... 7 Experimental Results
Researcher Affiliation	Academia	1Shandong University, Qingdao, China 2University of Amsterdam, Amsterdam, The Netherlands
Pseudocode	No	The paper describes its method through text and diagrams (e.g., Figure 3) but does not include formal pseudocode or algorithm blocks.
Open Source Code	Yes	1We release our code at https://github.com/sunnweiwei/Mix CL.
Open Datasets	Yes	We conduct experiments on the Wizard of Wikipedia (Wo W) dataset. Wo W is built with crowd-sourcing and employs Wikipedia as the knowledge corpus. ... The ground-truth knowledge used in each turn is manually labeled. ... Dinan et al. 2019
Dataset Splits	No	The paper states,
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper mentions using
Experiment Setup	Yes	We determine the hyperparameters through pilot experiments. We set the weight of the language model loss α3 to 0.3 at initialization and linearly decay until 0. We set α1 and α2, i.e., the weight of the MLE loss and MCL loss, to 0.4 and 0.3, respectively, and linearly increase to 0.5 and 0.5. We use greedy decoding in testing.