Privately Aligning Language Models with Reinforcement Learning

Authors: Fan Wu, Huseyin A Inan, Arturs Backurs, Varun Chandrasekaran, Janardhan Kulkarni, Robert Sim

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results validate the effectiveness of our approach, offering competitive utility while ensuring strong privacy protections. We empirically evaluate our DP framework on commonly studied tasks (in non-privacy literature).
Researcher Affiliation Collaboration Fan Wu1 , Huseyin A. Inan2, Arturs Backurs3, Varun Chandrasekaran1, Janardhan Kulkarni3, Robert Sim2 1 University of Illinois Urbana-Champaign, 2 M365 Research, 3 Microsoft Research
Pseudocode Yes We give a complete pseudo-code of PPO implementation in Appendix F. Algorithm 2: Differential Privacy Stochastic Gradient Descent (DPSGD). Algorithm 3: Aligning language models with RL (PPO), full version
Open Source Code No The paper mentions using the TRL framework and links to its documentation (https://huggingface.co/docs/trl/index), but it does not provide a specific link or explicit statement about the release of their own implementation code for the methodology described in the paper.
Open Datasets Yes for a positive review generation task on the IMDb dataset (Maas et al., 2011), and (ii) alignment via RL from human feedback (RLHF) for a summarization task on the Reddit TL;DR dataset (V olske et al., 2017).
Dataset Splits Yes alignment with DP uses half of the training dataset in the SFT step and the remaining half in the RL step. Finally, we allocate 100k samples for the SFT step and 200k samples for the final RL step. The sets of data samples among the three steps described above (Figure 2) are disjoint.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run the experiments. It only mentions "compute constraints" in a general sense.
Software Dependencies No The paper mentions using the "TRL framework" and GPT-2 models but does not provide specific version numbers for any software dependencies like PyTorch, Python, or other libraries.
Experiment Setup Yes For Lo RA, we choose the bottleneck rank r = 4 and fine-tune query and value matrices of the attention layers. For non-private SFT, we tune the batch size and the learning rate from the set {8, 16, 32, 64} and in the range [1e-6, 1e-2] respectively. For DP SFT, we set the batch size to 512 and the number of epochs to 40. For DPPPO, we set the minibatch size to 256, the batch size to 4096 and the number of epochs to 100.