OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Authors: Guan Wang, Sijie Cheng, Xianyuan Zhan, Xiangang Li, Sen Song, Yang Liu
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments on three standard benchmarks, our openchat-13b fine-tuned with C-RLFT achieves the highest average performance among all 13b open-source language models. Moreover, we use AGIEval to validate the model generalization performance, in which only openchat-13b surpasses the base model. Finally, we conduct a series of analyses to shed light on the effectiveness and robustness of Open Chat. |
| Researcher Affiliation | Collaboration | Guan Wang1,2 , Sijie Cheng1,3,5 , Xianyuan Zhan3,4, Xiangang Li5, Sen Song2 B, Yang Liu1,3,4 B 1Department of Computer Science and Technology, Tsinghua University 2Laboratory of Brain and Intelligence, Tsinghua University 3Institute for AI Industry Research (AIR), Tsinghua University 4Shanghai Artificial Intelligence Laboratory 501.AI |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code, data, and models are publicly available at https://github.com/imoneoi/openchat and https://huggingface.co/openchat. |
| Open Datasets | Yes | Following Vicuna (Chiang et al., 2023), we adopt a widely-used SFT dataset, the Share GPT dataset. The Share GPT dataset consists of approximately 70k user-shared conversations, including around 6k expert conversations generated by GPT-4 and the remaining sub-optimal conversations from GPT-3.5. We perform experiments to assess their varying quality in Sec. 5.1. 1The Share GPT dataset is collected from https://sharegpt.com/. |
| Dataset Splits | No | The paper specifies training details such as 'fine-tune the model for 5 epochs on the Share GPT dataset' and 'an effective batch size of 200k tokens', but it does not specify a separate validation dataset split or how validation was performed during their fine-tuning process on the Share GPT data. |
| Hardware Specification | No | The paper mentions fine-tuning a model but does not specify any hardware details like GPU models, CPU types, or memory used for the experiments. |
| Software Dependencies | No | The paper mentions using the Adam W optimizer and a cosine learning rate schedule, but it does not specify software dependencies with version numbers (e.g., Python version, specific deep learning framework version like PyTorch or TensorFlow). |
| Experiment Setup | Yes | The openchat-13b is based on the llama-2-13b (Touvron et al., 2023b). We fine-tune the model for 5 epochs on the Share GPT dataset using the Adam W optimizer with a sequence length of 4,096 tokens and an effective batch size of 200k tokens. Given that the reward weight term in Eq. (6) (exp(rc/β)) remains constant within a class, we simplify the process by assigning a unit weight to Dexp and the weight of 0.1 to Dsub. The Adam W optimizer s hyperparameters are set as follows: β1 = 0.9, β2 = 0.95, ϵ = 10 5, and weight decay of 0.1. We employ a cosine learning rate schedule with a maximum learning rate of 6.7 10 5, which decays to 10% of the maximum value. |