reproducibilityindex.ai

Aligning Large Language Models with Representation Editing: A Control Perspective

Authors: Lingkai Kong, Haorui Wang, Wenhao Mu, Yuanqi Du, Yuchen Zhuang, Yifei Zhou, Yue Song, Rongzhi Zhang, Kai Wang, Chao Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that our method outperforms existing test-time alignment techniques while requiring significantly fewer resources compared to fine-tuning methods.
Researcher Affiliation	Academia	Lingkai Kong 1, Haorui Wang 1, Wenhao Mu 1, Yuanqi Du2 Yuchen Zhuang1, Yifei Zhou3, Yue Song4, Rongzhi Zhang1 Kai Wang1, Chao Zhang1 1Georgia Tech 2Cornell University 3UC Berkeley 4University of Trento
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	Our code is available at https://github.com/Lingkai-Kong/RE-Control.
Open Datasets	Yes	We evaluate our method on the HH-RLHF [5] and Stanford SHP (SHP) [21] datasets, which are popular for LLM alignment.
Dataset Splits	Yes	We randomly sample 1000 data points from the training set as a separate validation set to select the hyperparameters the step size α and the number of updates n based on the sum of coherence, diversity, and average reward.
Hardware Specification	Yes	We conduct our experiments on a server equipped with NVIDIA A100 (80GB VRAM) GPUs.
Software Dependencies	Yes	We utilize the NVIDIA CUDA toolkit version 12.4. All experiments are implemented using Python 3.12.2 and the Py Torch framework version 2.2.2.
Experiment Setup	Yes	The training hyperparameters of the value networks are summarized in Table 3. The inference parameters are summarized in Table 4. Table 6 provides training hyperparameters for proximal policy optimization (PPO) and Table 7 for Direct Policy Optimization (DPO).