Unveiling and Manipulating Prompt Influence in Large Language Models

Authors: Zijian Feng, Hanzhang Zhou, ZIXIAO ZHU, Junlang Qian, Kezhi Mao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments reveal that the TDD surpasses state-of-the-art baselines with a big margin in elucidating the causal relationships between prompts and LLM outputs. Comprehensive experiments show that TDD markedly outperforms advanced saliency methods in discerning the causal relationships between prompts and LLM outputs across the entire vocabulary.
Researcher Affiliation Academia 1Institute of Catastrophe Risk Management, Interdisciplinary Graduate Programme, Nanyang Technological University, Singapore 2School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 3Future Resilient Systems Programme, Singapore-ETH Centre, CREATE Campus, Singapore
Pseudocode No The paper describes procedures using mathematical formulas and textual explanations but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code Yes 1Code will be released here: https://github.com/zijian678/TDD
Open Datasets Yes The Benchmark of Linguistic Minimal Pairs (BLi MP) encompasses 67 distinct datasets...we utilize BLi MP (Warstadt et al., 2020)...AG s News (Zhang et al., 2015) for topic classification and SST2 (Socher et al., 2013) for sentiment analysis...we employ the 5000 neutral prompts from OWT Gokaslan & Cohen (2019)
Dataset Splits No The paper discusses evaluation metrics like AOPC and Sufficiency based on perturbing tokens, and mentions datasets like BLiMP, AG's News, SST2, and OWT, but does not provide specific train/validation/test dataset split percentages or sample counts.
Hardware Specification Yes all experiments, which are conducted on an NVIDIA RTX A5000 GPU.
Software Dependencies No The paper mentions that '4-bit versions are utilized' for models larger than 6 billion parameters but does not provide specific version numbers for software dependencies like programming languages or libraries (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes For our experiments, we utilize GPT2 and, in adherence to the methodology of Schick et al. (2021), generate token continuations limited to 20 tokens. we employ TDD to pinpoint the top 15% of crucial tokens, treat them as triggers, and subsequently neutralize them. The generation parameters remain consistent with those of toxic language suppression, with the exception that only one token is replaced in each prompt since the length of the prompt is generally smaller than 10. The hyperparameters of these two baselines strictly follow the author s recommendations.