Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Position: Data-driven Discovery with Large Generative Models

Authors: Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal, Sanchaita Hazra, Ashish Sabharwal, Peter Clark

ICML 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Then, through DATAVOYAGER, a proof-of-concept utilizing GPT-4, we demonstrate how LGMs fulfill several of these desiderata a feat previously unattainable while also highlighting important limitations in the current system that open up opportunities for novel ML research.
Researcher Affiliation	Collaboration	1Allen Institute for AI 2Open Locus 3University of Massachusetts Amherst 4University of Utah. Correspondence to: Bodhisattwa Prasad Majumder <EMAIL>, Harshit Surana <EMAIL>.
Pseudocode	No	The paper includes code snippets within the text (e.g., Python code for statistical tests) and diagrams illustrating system architecture, but it does not present any clearly labeled pseudocode blocks or algorithms.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the DATAVOYAGER system or provide a direct link to a code repository for its methodology.
Open Datasets	Yes	For example, Smith et al. (2005) explored the link between time preference and BMI from the National Longitudinal Surveys using several variables indicating the saving behavior of the respondents. ... National Longitudinal Survey of Youth data with a question on how incarceration and race affected wealth was fed to DATAVOYAGER; it is a question studied in (Zaw et al., 2016).
Dataset Splits	No	The paper does not explicitly provide specific percentages, sample counts, or methodologies for dataset splits (e.g., training, validation, test sets) to reproduce the data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper mentions statistical analysis tools and Python for code generation, but it does not list specific software libraries or dependencies with their version numbers required to replicate the experiments.
Experiment Setup	No	The paper does not specify concrete experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or specific training configurations.