reproducibilityindex.ai

Position: Near to Mid-term Risks and Opportunities of Open-Source Generative AI

Authors: Francisco Eiras, Aleksandar Petrov, Bertie Vidgen, Christian Schroeder De Witt, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Botos Csaba, Fabro Steibel, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Marvin Imperial, Juan A. Nolazco-Flores, Lori Landay, Matthew Thomas Jackson, Paul Rottger, Philip Torr, Trevor Darrell, Yong Suk Lee, Jakob Nicolaus Foerster

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To set the stage, we first introduce an AI openness taxonomy system and apply it to 40 current large language models. We analyzed the pipeline components of 40 high-impact LLMs released from 2019 to 2023. In Figure 4, we show the distribution of openness levels for each of the pipeline components analyzed.
Researcher Affiliation	Academia	1University of Oxford 2MLCommons 3Kenyon College 4Center for Computation & Technology, Louisiana State University 5Institute for Technology & Society (ITS), Rio 6University of California, Berkeley 7University of Virginia 8Luxembourg Institute of Science and Technology 9University of Luxembourg 10University of Bath 11National University Philippines 12ITESM 13Berklee College of Music 14Bocconi University 15University of Notre Dame.
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper analyzes the openness of other large language models and their components, but it does not provide an explicit statement or link for the open-sourcing of its own methodology's code.
Open Datasets	No	The paper mentions that the LLMs it analyzes are 'evaluated on openly available evaluation datasets such as MMLU or Natural Questions (Hendrycks et al., 2020; Kwiatkowski et al., 2019)', but the paper itself does not use a dataset for training or evaluation of a model developed by the authors. Instead, it classifies existing models based on publicly available information.
Dataset Splits	No	The paper's empirical component involves classifying existing models and does not describe a process of training or validating a model, thus no dataset splits for training, validation, or testing are provided for the paper's own analysis.
Hardware Specification	No	The paper presents an analysis and classification of existing models but does not describe any computational experiments or model training that would require specific hardware specifications.
Software Dependencies	No	The paper performs an analysis of existing models but does not detail any specific software dependencies with version numbers used for its own work.
Experiment Setup	No	The paper conducts an empirical analysis by classifying existing LLMs based on their openness, but it does not describe an experimental setup with hyperparameters or training settings for a model or system developed by the authors.