Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Extrapolating Expected Accuracies for Large Multi-Class Problems
Authors: Charles Zheng, Rakesh Achanta, Yuval Benjamini
JMLR 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method using simulations in Section 4. In Section 5, we demonstrate our method on a facial recognition problem, as well as an optical character recognition problem. We ran simulations to check how the proposed extrapolation method, Class Ex Reg, performs in different settings. The results are displayed in Figure 7. We demonstrate the extrapolation of average accuracy in two data examples: (i) predicting the accuracy of a face recognition on a large set of labels from the system s accuracy on a smaller subset, and (ii) extrapolating the performance of various classifiers on an optical character recognition (OCR) problem in the Telugu script, which has over 400 glyphs. |
| Researcher Affiliation | Academia | Section on Functional Imaging Methods National Institute of Mental Health Bethesda, MD; Department of Statistics Stanford University Palo Alto, CA; Department of Statistics The Hebrew University of Jerusalem, Jerusalem, Israel |
| Pseudocode | No | The paper only describes the methodology in prose, without a distinct 'Pseudocode' or 'Algorithm' section, or code-like formatted procedures. |
| Open Source Code | Yes | Code for the methods and the simulations can be found in https://github.com/snarles/Class Ex. |
| Open Datasets | Yes | The face-recognition example takes data from the Labeled Faces in the Wild data set (Huang et al. (2007)), where we selected the 1672 individuals with at least 2 face photos. In the Telugu optical character recognition example (Achanta and Hastie (2015)), we consider the use of three different classifiers |
| Dataset Splits | Yes | The full data consists of 400 classes with 50 training and 50 test observations for each class. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments, only mentioning the types of models trained. |
| Software Dependencies | No | For each photo x, a 128-dimensional feature vector g(x) is obtained as follows. The computer vision library DLLib is used to detect landmarks in x, and to apply a nonlinear transformation to align x to a template. The aligned photograph is then downsampled to a 96 96 image. The downsampled image is fed into a pre-trained deep convolutional neural network to obtain the 128-dimensional feature vector g(x). More details are found in Amos et al. (2016). and the stats package in the R statistical computing environment. The paper mentions several software components (DLLib, OpenFace, R, stats package) but does not provide specific version numbers for any of them, which is required for reproducibility. |
| Experiment Setup | Yes | The network architecture is as follows: 48x48-4C3-MP2-6C3-8C3-MP2-32C3-50C3-MP2-200C3-SM. In the simulation, we use a grid h = {0.1, 0.2, . . . , 1} for bandwidth selection. The noise-level parameter σ determines the difficulty of classification. |