Transcendence: Generative Models Can Outperform The Experts That Train Them
Authors: Edwin Zhang, Vincent Zhu, Naomi Saphra, Anat Kleiman, Benjamin Edelman, Milind Tambe, Sham Kakade, Eran Malach
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate transcendence by training an autoregressive transformer to play chess from game transcripts, and show that the trained model can sometimes achieve better performance than all players in the dataset.1 We theoretically prove that transcendence can be enabled by low-temperature sampling, and rigorously assess this claim experimentally. |
| Researcher Affiliation | Collaboration | Edwin Zhang Open AI Harvard University Humanity Unleashed edwin@openai.com Vincent Zhu UC Santa Barbara Humanity Unleashed vincentzhu@ucsb.edu Naomi Saphra Harvard University Kempner Institute nsaphra@g.harvard.edu Anat Kleiman Harvard University Apple anatkleiman@g.harvard.edu Benjamin L. Edelman Princeton University Harvard University bedelman@g.harvard.edu Milind Tambe Harvard University tambe@g.harvard.edu Sham Kakade Harvard University Kempner Institute sham@g.harvard.edu Eran Malach Harvard University Kempner Institute emalach@g.harvard.edu |
| Pseudocode | No | The paper includes theoretical proofs in the appendix but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | To play with our models, code, and data, please see our website at https://transcendence.eddie.win. We also release our code openly to support further research into transcendence... |
| Open Datasets | Yes | Our dataset consists of human chess games from the lichess.org open source database from January 2023 to October 2023. We use the Glicko-2 rating system [7], which is also adopted by https://lichess.org, the free and open-source online chess server from which we source our dataset. |
| Dataset Splits | No | The paper discusses training on datasets and testing against Stockfish, but does not explicitly state the training, validation, and test dataset splits with percentages or counts. |
| Hardware Specification | Yes | We train all of our models on the Nvidia H100 80GB GPU. |
| Software Dependencies | No | The paper mentions 'Adam W [13]' for the optimizer and refers to 'modern large model training' and the OPT-175B team's practices, but does not provide specific version numbers for software dependencies like Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | We give a full list of the hyperparameters we used for training here. |