reproducibilityindex.ai

Privately Aligning Language Models with Reinforcement Learning

Authors: Fan Wu, Huseyin A Inan, Arturs Backurs, Varun Chandrasekaran, Janardhan Kulkarni, Robert Sim

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results validate the effectiveness of our approach, offering competitive utility while ensuring strong privacy protections. We empirically evaluate our DP framework on commonly studied tasks (in non-privacy literature).
Researcher Affiliation	Collaboration	Fan Wu1 , Huseyin A. Inan2, Arturs Backurs3, Varun Chandrasekaran1, Janardhan Kulkarni3, Robert Sim2 1 University of Illinois Urbana-Champaign, 2 M365 Research, 3 Microsoft Research
Pseudocode	Yes	We give a complete pseudo-code of PPO implementation in Appendix F. Algorithm 2: Differential Privacy Stochastic Gradient Descent (DPSGD). Algorithm 3: Aligning language models with RL (PPO), full version
Open Source Code	No	The paper mentions using the TRL framework and links to its documentation (https://huggingface.co/docs/trl/index), but it does not provide a specific link or explicit statement about the release of their own implementation code for the methodology described in the paper.
Open Datasets	Yes	for a positive review generation task on the IMDb dataset (Maas et al., 2011), and (ii) alignment via RL from human feedback (RLHF) for a summarization task on the Reddit TL;DR dataset (V olske et al., 2017).
Dataset Splits	Yes	alignment with DP uses half of the training dataset in the SFT step and the remaining half in the RL step. Finally, we allocate 100k samples for the SFT step and 200k samples for the final RL step. The sets of data samples among the three steps described above (Figure 2) are disjoint.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run the experiments. It only mentions "compute constraints" in a general sense.
Software Dependencies	No	The paper mentions using the "TRL framework" and GPT-2 models but does not provide specific version numbers for any software dependencies like PyTorch, Python, or other libraries.
Experiment Setup	Yes	For Lo RA, we choose the bottleneck rank r = 4 and fine-tune query and value matrices of the attention layers. For non-private SFT, we tune the batch size and the learning rate from the set {8, 16, 32, 64} and in the range [1e-6, 1e-2] respectively. For DP SFT, we set the batch size to 512 and the number of epochs to 40. For DPPPO, we set the minibatch size to 256, the batch size to 4096 and the number of epochs to 100.