reproducibilityindex.ai

Spectral Editing of Activations for Large Language Model Alignment

Authors: Yifu QIU, Zheng Zhao, Yftah Ziser, Anna Korhonen, Edoardo Maria Ponti, Shay Cohen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We run extensive experiments on benchmarks concerning truthfulness and bias with six open-source LLMs of different sizes and model families. The results demonstrate the superiority of SEA in effectiveness, generalisation to similar tasks, as well as computation and data efficiency.
Researcher Affiliation	Collaboration	1Yifu Qiu, 1Zheng Zhao, 3Yftah Ziser, 2Anna Korhonen, 1Edoardo M. Ponti, 1Shay B. Cohen 1Institute for Language, Cognition and Computation, University of Edinburgh 2Language Technology Lab, University of Cambridge 3Nvidia Research
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. It describes the method in prose and equations.
Open Source Code	Yes	Our code and edited models are available on https://github.com/yfqiu-nlp/sea-llm .
Open Datasets	Yes	Truthful QA [23]: Apache-2.0 license, reader can find the corresponding version we use in this paper in https://github.com/sylinrl/Truthful QA. Halu Eval [20]: MIT license, reader can find the corresponding version we use in this paper in https://github.com/RUCAIBox/Halu Eval. BBQ [27]: CC-BY-4.0 license, reader can find the corresponding version we use in this paper in https://github.com/nyu-mll/BBQ.
Dataset Splits	Yes	All hyperparameters are determined with two-fold cross-validation on Truthful QA following Li et al. [21].
Hardware Specification	Yes	All experiments for LLa MA-2-Chat-7B are conducted on a single CPU machine (Intel Xeon Platinum 8360Y CPUs), utilising 32 cores per experiment, with one 40GB NVIDIA A100 Tensor Core GPU.
Software Dependencies	Yes	Transformers [42]: Apache-2.0 license. We use the 4.38.0 version, following the link at https://github.com/huggingface/transformers. LLa MA-Factory [48]: Apache-2.0 license. We use the version at https://github.com/ hiyouga/LLa MA-Factory.
Experiment Setup	Yes	In the best setting for linear SEA for truthfulness, we use 2000 pairs of demonstration randomly sampled from our training split of BBQ to obtain the editing projections for truthfulness. We set the hyperparameter K = 99.8% and we edit the top 21 layers. All hyperparameters are determined with two-fold cross-validation on Truthful QA following Li et al. [21].