reproducibilityindex.ai

Can LLM-Generated Misinformation Be Detected?

Authors: Canyu Chen, Kai Shu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Then, through extensive empirical investigation, we discover that LLM-generated misinformation can be harder to detect for humans and detectors compared to human-written misinformation with the same semantics, which suggests it can have more deceptive styles and potentially cause more harm.
Researcher Affiliation	Academia	Canyu Chen Illinois Institute of Technology cchen151@hawk.iit.edu Kai Shu Illinois Institute of Technology kshu@iit.edu
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	Project website: https://llm-misinformation.github.io/. This website states: 'Our code will be made publicly available soon.' and explicitly links to the dataset repository, not the methodology code. The paper also states: 'The dataset has been open-sourced in the Git Hub repository https://github.com/llm-misinformation/llm-misinformation.' which refers to data, not the source code for the methodology itself.
Open Datasets	Yes	We adopt three typical real-world human-written misinformation datasets including Politifact (Shu et al., 2020), Gossipcop (Shu et al., 2020) and Co AID (Cui & Lee, 2020). ... The dataset has been open-sourced in the Git Hub repository https://github.com/llm-misinformation/llm-misinformation.
Dataset Splits	No	The paper explicitly states that LLMs are used with a zero-shot prompting strategy as detectors, meaning they are not trained or validated on specific splits of these datasets in the traditional sense. The data mentioned is used for evaluation (testing) purposes: "we utilize the whole Politifact dataset and the randomly sampled 10% data of the Gossipcop and Co AID datasets with the random seed as 1." This describes the evaluation data, not specific training/validation splits for a model.
Hardware Specification	No	The paper mentions using specific LLM models (e.g., Chat GPT-3.5, GPT-4, Llama2, Vicuna) and their APIs but does not specify the underlying hardware (e.g., GPU models, CPU types) on which these models or experiments were run.
Software Dependencies	Yes	As for Chat GPT-3.5 (gpt-3.5-turbo) or GPT-4 (gpt-4) as generators or detectors, we adopt the default API setting of Open AI. As for Llama2 (Llama2-7B-chat, Llama2-13B-chat, and Llama2-70B-chat) and Vicuna (Vicuna-7b-v1.3, Vicuna-13b-v1.3, and Vicuna-33b-v1.3) as generators or detectors, we adopt the hyperparameters for the sampling strategy as follows: top_p = 0.9, temperature = 0.8, max_tokens = 2,000.
Experiment Setup	Yes	As for Llama2 (Llama2-7B-chat, Llama2-13B-chat, and Llama2-70B-chat) and Vicuna (Vicuna-7b-v1.3, Vicuna-13b-v1.3, and Vicuna-33b-v1.3) as generators or detectors, we adopt the hyperparameters for the sampling strategy as follows: top_p = 0.9, temperature = 0.8, max_tokens = 2,000.