reproducibilityindex.ai

PaCE: Parsimonious Concept Engineering for Large Language Models

Authors: Jinqi Luo, Tianjiao Ding, Kwan Ho Ryan Chan, Darshan Thaker, Aditya Chattopadhyay, Chris Callison-Burch, Rene Vidal

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on tasks such as response detoxification, faithfulness enhancement, and sentiment revising, and show that Pa CE achieves state-of-the-art alignment performance while maintaining linguistic capabilities.
Researcher Affiliation	Academia	University of Pennsylvania Johns Hopkins University {jinqiluo,tjding}@upenn.edu
Pseudocode	Yes	Algorithm 1: Overcomplete Oblique Projection (Obliq Proj)
Open Source Code	No	Our collected dataset for concept representations is available at https://github.com/peterljq/Parsimonious-Concept-Engineering. ... We opensource the Pa CE-1M dataset to facilitate future research and practical applications of LLM alignment, and will release the source code soon.
Open Datasets	Yes	Our collected dataset for concept representations is available at https://github.com/peterljq/Parsimonious-Concept-Engineering.
Dataset Splits	No	We conduct experiments on tasks such as response detoxification, faithfulness enhancement, and sentiment revising... We compare our method in defending maliciousness against activation manipulation methods ( 2.2) on the Safe Edit [74] dataset with its safety scorer... We use the Holistic Bias suite [66] and hate speech evaluator [64] to measure the sentiment of the response...
Hardware Specification	Yes	The experiments are conducted on a workstation of 8 NVIDIA A40 GPUs.
Software Dependencies	No	GPT-4-0125 is used for dictionary construction and concept partition. ... After retrieving the relevant knowledge (with the contriever [27]) from Wikipedia for concept synthesis, we take the top-5 ranked facts to append the instruction of LLM. The FAISS-indexed [31] Wikipedia is a snapshot of the 21 million disjoint text blocks from Wikipedia until December 2018.
Experiment Setup	Yes	Each response of the target LLM is set at a maximum of 512 tokens. Activation vectors are extracted from the last-29th to the last-11th layer (totaling 19 layers) of the target LLM s decoder layers. ... We set the scalar of the representation reading for concept vectors to 3.0. ... When solving the optimization problem for decomposition in 3.3, we set τ = 0.95 and α = 0.05 following the observations in [82].