LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Authors: Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Tianle Li, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zhuohan Li, Zi Lin, Eric Xing, Joseph E. Gonzalez, Ion Stoica, Hao Zhang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate its versatility through four use cases: developing content moderation models that perform similarly to GPT-4, building a safety benchmark, training instruction-following models that perform similarly to Vicuna, and creating challenging benchmark questions. The results are presented in Table 3. |
| Researcher Affiliation | Academia | 1 UC Berkeley 2 UC San Diego 3 Carnegie Mellon University 4 Stanford 5 MBZUAI |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The dataset is publicly available at https://huggingface.co/datasets/lmsys/lmsys-chat-1m. The code for this website is publicly available3. 3https://github.com/lm-sys/Fast Chat/tree/v0.2.26#serving-with-web-gui |
| Open Datasets | Yes | The dataset is publicly available at https://huggingface.co/datasets/lmsys/lmsys-chat-1m. LMSYS-Chat-1M is collected on our website2 from April to August 2023. |
| Dataset Splits | No | The paper describes data selection for training and evaluation sets for specific tasks, but does not provide specific train/validation/test splits or percentages for any of its models' training processes that would allow direct reproduction of the data partitioning. |
| Hardware Specification | Yes | We utilize dozens of A100 GPUs to host our website, serving a total of 25 models over the course of the timespan. |
| Software Dependencies | Yes | The text-moderation-latest (006) is the latest Open AI moderation API (Open AI, 2023b) introduced on 2023/8/25. |
| Experiment Setup | Yes | Instead of developing a classifier, we fine-tune a language model to generate explanations for why a particular message was flagged, based on the system prompt described in the moderation task (see Appendix B.2). The detailed system prompt and few-shot examples can be found in Appendix B.7. |