Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Estimating Learnability in the Sublinear Data Regime
Authors: Weihao Kong, Gregory Valiant
NeurIPS 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the practical viability of our approaches on synthetic and real data. All experiments were run in Matlab v2016b on a Mac Book Pro laptop, and the code is available from our websites. More details of the experiments are given in the supplementary material. |
| Researcher Affiliation | Academia | Weihao Kong Stanford University EMAIL Gregory Valiant Stanford University EMAIL |
| Pseudocode | Yes | Algorithm 1 Estimating Linearity, General covariance; Algorithm 2. Estimating Classification Error, General Covariance |
| Open Source Code | Yes | All experiments were run in Matlab v2016b on a Mac Book Pro laptop, and the code is available from our websites. |
| Open Datasets | Yes | Regression: NLP Experiments. This data is from Kaggle s Wine-Reviews dataset... Binary Classification: MNIST. We also evaluated Algorithm 2 for predicting the classification error on the MNIST dataset. |
| Dataset Splits | No | The paper frequently mentions 'test err Bayes-Opt' and 'training err Bayes-Opt' in figures for comparison, and for MNIST, states 'training on 50k datapoints and testing on the remaining datapoints' as part of ground truth calculation. However, it does not explicitly describe a validation split or methodology used for hyperparameter tuning or early stopping for its own models. |
| Hardware Specification | Yes | All experiments were run in Matlab v2016b on a Mac Book Pro laptop |
| Software Dependencies | Yes | All experiments were run in Matlab v2016b |
| Experiment Setup | Yes | Regression: Synthetic Data Experiments. In this experiment, n datapoints x1, . . . , xn Rd are drawn from an multivariate Gaussian, N(0, Σ)... Binary Classification: Synthetic Data Experiments. ... β is a d-dimensional vector with β = 2... Each image is represented as a d = 784 dimensional vector, and the data are 0 centered and scaled so the largest singular value of the sample covariance matrix is 1. |