Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
ODIN: Disentangled Reward Mitigates Hacking in RLHF
Authors: Lichang Chen, Chen Zhu, Jiuhai Chen, Davit Soselia, Tianyi Zhou, Tom Goldstein, Heng Huang, Mohammad Shoeybi, Bryan Catanzaro
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Based on this evaluation, we conduct large-scale studies, where the results shed insights into the efficacy of hyperparameters and tricks used in RL on mitigating length bias. Experiments demonstrate that our approach eliminates the reward correlation with length, and improves the obtained policy by a significant margin. |
| Researcher Affiliation | Collaboration | 1University of Maryland, College Park 2Meta, work done while at Nvidia 3Nvidia. |
| Pseudocode | Yes | Algorithm 1 Proximal Policy Optimization for RLHF |
| Open Source Code | No | The paper does not provide an explicit statement or link confirming the release of their source code for the described methodology. |
| Open Datasets | Yes | We use the Open Assistant dataset (K opf et al., 2023) |
| Dataset Splits | Yes | We tried different learning rates from {1e 5, 3e 5, 5e 5} with batch size 128 for tuning both the baseline RM and ODIN on 22k preference data for 3 epochs, and picked the one with the highest validation accuracy for both. |
| Hardware Specification | Yes | All experiments are implemented with Deep Speed-Chat (Yao et al., 2023) and Huggingface Transformers (Wolf et al., 2020), running on 8 NVIDIA A100 80GB GPUs. |
| Software Dependencies | No | The paper mentions software like Deep Speed-Chat and Huggingface Transformers but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | We search η {5e 7, 1e 6, 2e 6}, ϵ {0.1, 0.2, 0.4}, β {2.5e 3, 5e 3, 1e 2, 2e 2}, c {inf, 2, 4}, and N {32, 64, 256}. Note we did not finish all experiments with β = 2.5e 3, but we have included the partial results in the plots when β = 2.5e 3 is not explicitly excluded. |