reproducibilityindex.ai

OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

Authors: Guan Wang, Sijie Cheng, Xianyuan Zhan, Xiangang Li, Sen Song, Yang Liu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on three standard benchmarks, our openchat-13b fine-tuned with C-RLFT achieves the highest average performance among all 13b open-source language models. Moreover, we use AGIEval to validate the model generalization performance, in which only openchat-13b surpasses the base model. Finally, we conduct a series of analyses to shed light on the effectiveness and robustness of Open Chat.
Researcher Affiliation	Collaboration	Guan Wang1,2 , Sijie Cheng1,3,5 , Xianyuan Zhan3,4, Xiangang Li5, Sen Song2 B, Yang Liu1,3,4 B 1Department of Computer Science and Technology, Tsinghua University 2Laboratory of Brain and Intelligence, Tsinghua University 3Institute for AI Industry Research (AIR), Tsinghua University 4Shanghai Artificial Intelligence Laboratory 501.AI
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code, data, and models are publicly available at https://github.com/imoneoi/openchat and https://huggingface.co/openchat.
Open Datasets	Yes	Following Vicuna (Chiang et al., 2023), we adopt a widely-used SFT dataset, the Share GPT dataset. The Share GPT dataset consists of approximately 70k user-shared conversations, including around 6k expert conversations generated by GPT-4 and the remaining sub-optimal conversations from GPT-3.5. We perform experiments to assess their varying quality in Sec. 5.1. 1The Share GPT dataset is collected from https://sharegpt.com/.
Dataset Splits	No	The paper specifies training details such as 'fine-tune the model for 5 epochs on the Share GPT dataset' and 'an effective batch size of 200k tokens', but it does not specify a separate validation dataset split or how validation was performed during their fine-tuning process on the Share GPT data.
Hardware Specification	No	The paper mentions fine-tuning a model but does not specify any hardware details like GPU models, CPU types, or memory used for the experiments.
Software Dependencies	No	The paper mentions using the Adam W optimizer and a cosine learning rate schedule, but it does not specify software dependencies with version numbers (e.g., Python version, specific deep learning framework version like PyTorch or TensorFlow).
Experiment Setup	Yes	The openchat-13b is based on the llama-2-13b (Touvron et al., 2023b). We fine-tune the model for 5 epochs on the Share GPT dataset using the Adam W optimizer with a sequence length of 4,096 tokens and an effective batch size of 200k tokens. Given that the reward weight term in Eq. (6) (exp(rc/β)) remains constant within a class, we simplify the process by assigning a unit weight to Dexp and the weight of 0.1 to Dsub. The Adam W optimizer s hyperparameters are set as follows: β1 = 0.9, β2 = 0.95, ϵ = 10 5, and weight decay of 0.1. We employ a cosine learning rate schedule with a maximum learning rate of 6.7 10 5, which decays to 10% of the maximum value.