Spectral Editing of Activations for Large Language Model Alignment

Authors: Yifu QIU, Zheng Zhao, Yftah Ziser, Anna Korhonen, Edoardo Maria Ponti, Shay Cohen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We run extensive experiments on benchmarks concerning truthfulness and bias with six open-source LLMs of different sizes and model families. The results demonstrate the superiority of SEA in effectiveness, generalisation to similar tasks, as well as computation and data efficiency.
Researcher Affiliation Collaboration 1Yifu Qiu, 1Zheng Zhao, 3Yftah Ziser, 2Anna Korhonen, 1Edoardo M. Ponti, 1Shay B. Cohen 1Institute for Language, Cognition and Computation, University of Edinburgh 2Language Technology Lab, University of Cambridge 3Nvidia Research
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It describes the method in prose and equations.
Open Source Code Yes Our code and edited models are available on https://github.com/yfqiu-nlp/sea-llm .
Open Datasets Yes Truthful QA [23]: Apache-2.0 license, reader can find the corresponding version we use in this paper in https://github.com/sylinrl/Truthful QA. Halu Eval [20]: MIT license, reader can find the corresponding version we use in this paper in https://github.com/RUCAIBox/Halu Eval. BBQ [27]: CC-BY-4.0 license, reader can find the corresponding version we use in this paper in https://github.com/nyu-mll/BBQ.
Dataset Splits Yes All hyperparameters are determined with two-fold cross-validation on Truthful QA following Li et al. [21].
Hardware Specification Yes All experiments for LLa MA-2-Chat-7B are conducted on a single CPU machine (Intel Xeon Platinum 8360Y CPUs), utilising 32 cores per experiment, with one 40GB NVIDIA A100 Tensor Core GPU.
Software Dependencies Yes Transformers [42]: Apache-2.0 license. We use the 4.38.0 version, following the link at https://github.com/huggingface/transformers. LLa MA-Factory [48]: Apache-2.0 license. We use the version at https://github.com/ hiyouga/LLa MA-Factory.
Experiment Setup Yes In the best setting for linear SEA for truthfulness, we use 2000 pairs of demonstration randomly sampled from our training split of BBQ to obtain the editing projections for truthfulness. We set the hyperparameter K = 99.8% and we edit the top 21 layers. All hyperparameters are determined with two-fold cross-validation on Truthful QA following Li et al. [21].