Privately Aligning Language Models with Reinforcement Learning
Authors: Fan Wu, Huseyin A Inan, Arturs Backurs, Varun Chandrasekaran, Janardhan Kulkarni, Robert Sim
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results validate the effectiveness of our approach, offering competitive utility while ensuring strong privacy protections. We empirically evaluate our DP framework on commonly studied tasks (in non-privacy literature). |
| Researcher Affiliation | Collaboration | Fan Wu1 , Huseyin A. Inan2, Arturs Backurs3, Varun Chandrasekaran1, Janardhan Kulkarni3, Robert Sim2 1 University of Illinois Urbana-Champaign, 2 M365 Research, 3 Microsoft Research |
| Pseudocode | Yes | We give a complete pseudo-code of PPO implementation in Appendix F. Algorithm 2: Differential Privacy Stochastic Gradient Descent (DPSGD). Algorithm 3: Aligning language models with RL (PPO), full version |
| Open Source Code | No | The paper mentions using the TRL framework and links to its documentation (https://huggingface.co/docs/trl/index), but it does not provide a specific link or explicit statement about the release of their own implementation code for the methodology described in the paper. |
| Open Datasets | Yes | for a positive review generation task on the IMDb dataset (Maas et al., 2011), and (ii) alignment via RL from human feedback (RLHF) for a summarization task on the Reddit TL;DR dataset (V olske et al., 2017). |
| Dataset Splits | Yes | alignment with DP uses half of the training dataset in the SFT step and the remaining half in the RL step. Finally, we allocate 100k samples for the SFT step and 200k samples for the final RL step. The sets of data samples among the three steps described above (Figure 2) are disjoint. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run the experiments. It only mentions "compute constraints" in a general sense. |
| Software Dependencies | No | The paper mentions using the "TRL framework" and GPT-2 models but does not provide specific version numbers for any software dependencies like PyTorch, Python, or other libraries. |
| Experiment Setup | Yes | For Lo RA, we choose the bottleneck rank r = 4 and fine-tune query and value matrices of the attention layers. For non-private SFT, we tune the batch size and the learning rate from the set {8, 16, 32, 64} and in the range [1e-6, 1e-2] respectively. For DP SFT, we set the batch size to 512 and the number of epochs to 40. For DPPPO, we set the minibatch size to 256, the batch size to 4096 and the number of epochs to 100. |