Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Predicting Classification Accuracy When Adding New Unobserved Classes
Authors: Yuli Slavutsky, Yuval Benjamini
ICLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 5 we verify the performance of Cleane X on simulations and real data-sets. We find it achieves better overall predictions of the expected accuracy, and very few large errors, compared to its competitors. |
| Researcher Affiliation | Academia | Yuli Slavutsky, Yuval Benjamini Department of Statistics and Data Science The Hebrew University of Jerusalem Jerusalem, Israel EMAIL |
| Pseudocode | Yes | Algorithm 1: Cleane X |
| Open Source Code | Yes | 1Code is publicly available at: https://github.com/Yuli Sl/Cleane X |
| Open Datasets | Yes | Experiment 1 Object Detection (CIFAR-100) We use the CIFAR dataset (Krizhevsky et al., 2009)... Experiment 2 Face Recognition (LFW) We use the Labeled Faces in the Wild dataset (Huang et al., 2007)... Experiment 3 Brain Decoding (f MRI) We analyze a mind-reading task described by Kay et al. (2008) |
| Dataset Splits | Yes | In each repetition we sub-sample k1 classes and predict the accuracy at 2 k k2 classes... In Algorithm 1, the training set is composed of N examples x from the set of k1 available classes... for k = 2, . . . , k1 do Compute Ak1 k end... In Experiment 3 Brain Decoding (f MRI), we use nt = 750 images and their response vectors to fit an embedding... The remaining n - nt = k2 = 1000 examples are used as an evaluation set. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU, CPU models, memory) used for running the experiments. |
| Software Dependencies | Yes | All the code in this work was implemented in Python 3.6. For the Cleane X algorithm we used Tensor Flow 1.14; for the regression based method we used the scipy.optimize package with the Newton-CG method; kernel density estimation was implemented using the density function from the stats library in R, imported to Python through the rpy2 package. |
| Experiment Setup | Yes | For our method, we use in all the experiments an identical feed-forward neural network with two hidden layers of sizes 512 and 128, a rectified linear activation between the layers, and a sigmoid applied on the output. We train the network according to Algorithm 1 for J = 10, 000 iterations with learning rate of η = 10 4 using Adam optimizer (Kingma & Ba, 2014). |