Contrastive Learning Reduces Hallucination in Conversations

Authors: Weiwei Sun, Zhengliang Shi, Shen Gao, Pengjie Ren, Maarten de Rijke, Zhaochun Ren

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on the Wizard of Wikipedia, a public, open-domain knowledgegrounded dialogue benchmark, and assess the effectiveness of Mix CL. Mix CL effectively reduces the hallucination of LMs in conversations and achieves the highest performance among LM-based dialogue agents in terms of relevancy and factuality. ... Our contributions are as follows: ... (iv) Experiments on the Wizard-of-Wikipedia dataset show that Mix CL effectively reduces the hallucinating content produced by the LM and achieves comparable performance to KB-based approaches. ... 6 Experimental Setup ... 7 Experimental Results
Researcher Affiliation Academia 1Shandong University, Qingdao, China 2University of Amsterdam, Amsterdam, The Netherlands
Pseudocode No The paper describes its method through text and diagrams (e.g., Figure 3) but does not include formal pseudocode or algorithm blocks.
Open Source Code Yes 1We release our code at https://github.com/sunnweiwei/Mix CL.
Open Datasets Yes We conduct experiments on the Wizard of Wikipedia (Wo W) dataset. Wo W is built with crowd-sourcing and employs Wikipedia as the knowledge corpus. ... The ground-truth knowledge used in each turn is manually labeled. ... Dinan et al. 2019
Dataset Splits No The paper states,
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions using
Experiment Setup Yes We determine the hyperparameters through pilot experiments. We set the weight of the language model loss α3 to 0.3 at initialization and linearly decay until 0. We set α1 and α2, i.e., the weight of the MLE loss and MCL loss, to 0.4 and 0.3, respectively, and linearly increase to 0.5 and 0.5. We use greedy decoding in testing.