reproducibilityindex.ai

HYDRA: Model Factorization Framework for Black-Box LLM Personalization

Authors: Yuchen Zhuang, Haotian Sun, Yue Yu, Rushi Qiang, Qifan Wang, Chao Zhang, Bo Dai

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that HYDRA outperforms existing state-of-the-art prompt-based methods by an average relative improvement of 9.01% across five diverse personalization tasks in the La MP benchmark. Our implementation is available at https://github.com/night-chen/HYDRA. We conduct extensive experiments on La MP [41], a comprehensive language model personalization benchmark, to evaluate the personalization capabilities of HYDRA across multiple dimensions, including three text classification tasks and two text generation tasks.
Researcher Affiliation	Collaboration	Yuchen Zhuang1, Haotian Sun1, Yue Yu1, Rushi Qiang1, Chao Zhang1, Bo Dai1 1 Georgia Institute of Technology, 2 Meta AI {yczhuang, haotian.sun, yueyu, rqiang6, chaozhang}@gatech.edu, wqfcr@fb.com, bodai@cc.gatech.edu
Pseudocode	Yes	Algorithm 1: HYDRA. (in Appendix C)
Open Source Code	Yes	Our implementation is available at https://github.com/night-chen/HYDRA. To facilitate future research in black-box LLM personalization, we will release the code repository and model checkpoints for transparency and reproducibility.
Open Datasets	Yes	We adopt a widely used language model personalization benchmark, La MP [41], focusing on a diverse set of personalized text classification and generation tasks... Data is publicly available in La MP benchmark [41].
Dataset Splits	Yes	For data splitting, we follow the user-based separation setting provided by the La MP benchmark, with 100 randomly selected users for training and an additional 50 randomly selected users for testing. (in Section 4.1). The 'Dataset Statistics' Table 6 also includes a '# Validation' column for each task.
Hardware Specification	Yes	Hardware and Software. We conduct all black-box LLM personalization experiments on CPU: AMD(R) EPYC(R) 7702 64-Core Processor @ 1.50GHz and GPU: NVIDIA A100-SXM4-80GB using Python 3.10.13.
Software Dependencies	Yes	Hardware and Software. We conduct all black-box LLM personalization experiments on CPU: AMD(R) EPYC(R) 7702 64-Core Processor @ 1.50GHz and GPU: NVIDIA A100-SXM4-80GB using Python 3.10.13. We employ gpt-3.5-turbo(1106) and BM25 [39] as the backbone black-box LLM and default retriever, respectively. Additionally, both HYDRA-Reranker and HYDRA-Adapter leverage the lightweight Long Former-Base (110M) [1] as the backend language models. In addition, we employ Adam W [19] as the optimizer with a weight decay of 0.01.
Experiment Setup	Yes	We set the maximum sequence length for a generated solution as 512 tokens and the temperature as 1.0 for flexibility in the generations of black-box LLMs, which serve as potential solution candidates. For other baselines, we maintain the temperature to 0 to avoid potential instability in performance. During the black-box LLM adaption stage, we set b = 8 for the generation of intermediate candidates using HYDRA-Adapter. During the training stage, we set the learning rate to 2e 5, the global batch size to 64, and the number of training epochs to 2 as default hyperparameter settings for all experiments. In addition, we employ Adam W [19] as the optimizer with a weight decay of 0.01.