Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Estimating Learnability in the Sublinear Data Regime

Authors: Weihao Kong, Gregory Valiant

NeurIPS 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the practical viability of our approaches on synthetic and real data. All experiments were run in Matlab v2016b on a Mac Book Pro laptop, and the code is available from our websites. More details of the experiments are given in the supplementary material.
Researcher Affiliation	Academia	Weihao Kong Stanford University EMAIL Gregory Valiant Stanford University EMAIL
Pseudocode	Yes	Algorithm 1 Estimating Linearity, General covariance; Algorithm 2. Estimating Classiﬁcation Error, General Covariance
Open Source Code	Yes	All experiments were run in Matlab v2016b on a Mac Book Pro laptop, and the code is available from our websites.
Open Datasets	Yes	Regression: NLP Experiments. This data is from Kaggle s Wine-Reviews dataset... Binary Classiﬁcation: MNIST. We also evaluated Algorithm 2 for predicting the classiﬁcation error on the MNIST dataset.
Dataset Splits	No	The paper frequently mentions 'test err Bayes-Opt' and 'training err Bayes-Opt' in figures for comparison, and for MNIST, states 'training on 50k datapoints and testing on the remaining datapoints' as part of ground truth calculation. However, it does not explicitly describe a validation split or methodology used for hyperparameter tuning or early stopping for its own models.
Hardware Specification	Yes	All experiments were run in Matlab v2016b on a Mac Book Pro laptop
Software Dependencies	Yes	All experiments were run in Matlab v2016b
Experiment Setup	Yes	Regression: Synthetic Data Experiments. In this experiment, n datapoints x1, . . . , xn Rd are drawn from an multivariate Gaussian, N(0, Σ)... Binary Classiﬁcation: Synthetic Data Experiments. ... β is a d-dimensional vector with β = 2... Each image is represented as a d = 784 dimensional vector, and the data are 0 centered and scaled so the largest singular value of the sample covariance matrix is 1.