Unveiling and Manipulating Prompt Influence in Large Language Models
Authors: Zijian Feng, Hanzhang Zhou, ZIXIAO ZHU, Junlang Qian, Kezhi Mao
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments reveal that the TDD surpasses state-of-the-art baselines with a big margin in elucidating the causal relationships between prompts and LLM outputs. Comprehensive experiments show that TDD markedly outperforms advanced saliency methods in discerning the causal relationships between prompts and LLM outputs across the entire vocabulary. |
| Researcher Affiliation | Academia | 1Institute of Catastrophe Risk Management, Interdisciplinary Graduate Programme, Nanyang Technological University, Singapore 2School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 3Future Resilient Systems Programme, Singapore-ETH Centre, CREATE Campus, Singapore |
| Pseudocode | No | The paper describes procedures using mathematical formulas and textual explanations but does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | 1Code will be released here: https://github.com/zijian678/TDD |
| Open Datasets | Yes | The Benchmark of Linguistic Minimal Pairs (BLi MP) encompasses 67 distinct datasets...we utilize BLi MP (Warstadt et al., 2020)...AG s News (Zhang et al., 2015) for topic classification and SST2 (Socher et al., 2013) for sentiment analysis...we employ the 5000 neutral prompts from OWT Gokaslan & Cohen (2019) |
| Dataset Splits | No | The paper discusses evaluation metrics like AOPC and Sufficiency based on perturbing tokens, and mentions datasets like BLiMP, AG's News, SST2, and OWT, but does not provide specific train/validation/test dataset split percentages or sample counts. |
| Hardware Specification | Yes | all experiments, which are conducted on an NVIDIA RTX A5000 GPU. |
| Software Dependencies | No | The paper mentions that '4-bit versions are utilized' for models larger than 6 billion parameters but does not provide specific version numbers for software dependencies like programming languages or libraries (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | For our experiments, we utilize GPT2 and, in adherence to the methodology of Schick et al. (2021), generate token continuations limited to 20 tokens. we employ TDD to pinpoint the top 15% of crucial tokens, treat them as triggers, and subsequently neutralize them. The generation parameters remain consistent with those of toxic language suppression, with the exception that only one token is replaced in each prompt since the length of the prompt is generally smaller than 10. The hyperparameters of these two baselines strictly follow the author s recommendations. |