Token-level Direct Preference Optimization
Authors: Yongcheng Zeng, Guoqing Liu, Weiyu Ma, Ning Yang, Haifeng Zhang, Jun Wang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we demonstrate the superior performance of our algorithm in three different open-sourced datasets: the IMDb sentiment dataset (Maas et al., 2011), the Anthropic HH dataset (Bai et al., 2022), and MT-bench (Zheng et al., 2023). |
| Researcher Affiliation | Collaboration | 1Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3Microsoft Research AI4Science 4University College London. |
| Pseudocode | Yes | Algorithm 1 Token-level Direct Preference Optimization (TDPO) |
| Open Source Code | Yes | Our code is opensourced at https://github.com/Vance0124/Tokenlevel-Direct-Preference-Optimization. |
| Open Datasets | Yes | In this section, we demonstrate the superior performance of our algorithm in three different open-sourced datasets: the IMDb sentiment dataset (Maas et al., 2011), the Anthropic HH dataset (Bai et al., 2022), and MT-bench (Zheng et al., 2023). |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits with percentages or sample counts for the datasets used in the experiments. |
| Hardware Specification | No | The paper does not explicitly provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | Appendix B provides PyTorch code snippets using 'import torch' and 'torch.nn.functional', but does not specify version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | Unless specified otherwise, we use a α = 0.5, β = 0.1, batch size of 64, and the RMSprop optimizer with a learning rate of 5e-6. We linearly warm up the learning rate from 0 to 5e-6 over 150 steps. |