Risk-Averse Fine-tuning of Large Language Models
Authors: Sapana Chaudhary, Ujwal Dinesha, Dileep Kalathil, Srinivas Shakkottai
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations on sentiment modification and toxicity mitigation tasks demonstrate the efficacy of risk-averse reinforcement learning with human feedback (RLHF) in promoting a safer and more constructive online discourse environment. |
| Researcher Affiliation | Collaboration | Sapana Chaudhary Amazon Web Services (AWS) chausapa@amazon.com Ujwal Dinesha Dileep Kalathil Srinivas Shakkottai Department of Electrical and Computer Engineering Texas A&M University {ujwald36,dileep.kalathil,sshakkot}@tamu.edu |
| Pseudocode | Yes | Our RA-RLHF pseudo-code is included in Algorithm 1. |
| Open Source Code | Yes | Our codebase is available on the linked Github repository 2, and further implementation details are included in Appendix E. |
| Open Datasets | Yes | In the first task, the LLM is provided with the initial part of a movie review from the IMDB data set [Maas et al., 2011]... We created two additional tasks using the Jigsaw [Jigsaw, 2017] and Real Toxicity Prompts [Gehman et al., 2020] datasets |
| Dataset Splits | Yes | For IMDB-Gen, we make use of the IMDB dataset... There are a total of 25k train and test reviews each. ...For constructing the task dataset, we sampled the original data to create a training set distribution of 70% non-toxic and 30% toxic data points and a test set containing 50% toxic and non-toxic points. The resulting dataset consists of 36, 973 training and 7, 708 test samples. |
| Hardware Specification | Yes | Our codes were run on machines with GPU configurations of NVIDIA Tesla V100 SXM2 32 GB, and NVIDIA A100 80 GB. |
| Software Dependencies | No | The paper mentions adapting implementations from the Hugging Face TRL repository and using specific Hugging Face models/tokenizers (e.g., 'Auto Model For Causal LMWith Value Head', 'GPT2Tokenizer Fast', 'lvwerra/distilbert-imdb', 'unitary/toxic-bert'), but it does not specify version numbers for general software dependencies like Python or PyTorch. |
| Experiment Setup | Yes | The following is a list of hyperparameters used for PPO training. Any parameter not mentioned here was set to the default parameter generated by Hugging Face s PPOConfig object. Table 7: RLHF Hyperparameters... Table 8: RA-RLHF Hyperparameters |