Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Extrapolating Expected Accuracies for Large Multi-Class Problems

Authors: Charles Zheng, Rakesh Achanta, Yuval Benjamini

JMLR 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method using simulations in Section 4. In Section 5, we demonstrate our method on a facial recognition problem, as well as an optical character recognition problem. We ran simulations to check how the proposed extrapolation method, Class Ex Reg, performs in diﬀerent settings. The results are displayed in Figure 7. We demonstrate the extrapolation of average accuracy in two data examples: (i) predicting the accuracy of a face recognition on a large set of labels from the system s accuracy on a smaller subset, and (ii) extrapolating the performance of various classiﬁers on an optical character recognition (OCR) problem in the Telugu script, which has over 400 glyphs.
Researcher Affiliation	Academia	Section on Functional Imaging Methods National Institute of Mental Health Bethesda, MD; Department of Statistics Stanford University Palo Alto, CA; Department of Statistics The Hebrew University of Jerusalem, Jerusalem, Israel
Pseudocode	No	The paper only describes the methodology in prose, without a distinct 'Pseudocode' or 'Algorithm' section, or code-like formatted procedures.
Open Source Code	Yes	Code for the methods and the simulations can be found in https://github.com/snarles/Class Ex.
Open Datasets	Yes	The face-recognition example takes data from the Labeled Faces in the Wild data set (Huang et al. (2007)), where we selected the 1672 individuals with at least 2 face photos. In the Telugu optical character recognition example (Achanta and Hastie (2015)), we consider the use of three diﬀerent classiﬁers
Dataset Splits	Yes	The full data consists of 400 classes with 50 training and 50 test observations for each class.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments, only mentioning the types of models trained.
Software Dependencies	No	For each photo x, a 128-dimensional feature vector g(x) is obtained as follows. The computer vision library DLLib is used to detect landmarks in x, and to apply a nonlinear transformation to align x to a template. The aligned photograph is then downsampled to a 96 96 image. The downsampled image is fed into a pre-trained deep convolutional neural network to obtain the 128-dimensional feature vector g(x). More details are found in Amos et al. (2016). and the stats package in the R statistical computing environment. The paper mentions several software components (DLLib, OpenFace, R, stats package) but does not provide specific version numbers for any of them, which is required for reproducibility.
Experiment Setup	Yes	The network architecture is as follows: 48x48-4C3-MP2-6C3-8C3-MP2-32C3-50C3-MP2-200C3-SM. In the simulation, we use a grid h = {0.1, 0.2, . . . , 1} for bandwidth selection. The noise-level parameter σ determines the diﬃculty of classiﬁcation.