Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Authors: Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Tianle Li, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zhuohan Li, Zi Lin, Eric Xing, Joseph E. Gonzalez, Ion Stoica, Hao Zhang
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate its versatility through four use cases: developing content moderation models that perform similarly to GPT-4, building a safety benchmark, training instruction-following models that perform similarly to Vicuna, and creating challenging benchmark questions. The results are presented in Table 3. |
| Researcher Affiliation | Academia | 1 UC Berkeley 2 UC San Diego 3 Carnegie Mellon University 4 Stanford 5 MBZUAI |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The dataset is publicly available at https://huggingface.co/datasets/lmsys/lmsys-chat-1m. The code for this website is publicly available3. 3https://github.com/lm-sys/Fast Chat/tree/v0.2.26#serving-with-web-gui |
| Open Datasets | Yes | The dataset is publicly available at https://huggingface.co/datasets/lmsys/lmsys-chat-1m. LMSYS-Chat-1M is collected on our website2 from April to August 2023. |
| Dataset Splits | No | The paper describes data selection for training and evaluation sets for specific tasks, but does not provide specific train/validation/test splits or percentages for any of its models' training processes that would allow direct reproduction of the data partitioning. |
| Hardware Specification | Yes | We utilize dozens of A100 GPUs to host our website, serving a total of 25 models over the course of the timespan. |
| Software Dependencies | Yes | The text-moderation-latest (006) is the latest Open AI moderation API (Open AI, 2023b) introduced on 2023/8/25. |
| Experiment Setup | Yes | Instead of developing a classifier, we fine-tune a language model to generate explanations for why a particular message was flagged, based on the system prompt described in the moderation task (see Appendix B.2). The detailed system prompt and few-shot examples can be found in Appendix B.7. |