Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Delegated Classification
Authors: Eden Saig, Inbal Talgam-Cohen, Nir Rosenfeld
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate that budget-optimal contracts can be constructed using small-scale data, leveraging recent advances in the study of learning curves and scaling laws. Performance and economic outcomes are evaluated using synthetic and real-world classification tasks. 4 Experiments |
| Researcher Affiliation | Academia | Eden Saig, Inbal Talgam-Cohen, Nir Rosenfeld Technion Israel Institute of Technology Haifa, Israel EMAIL |
| Pseudocode | No | The paper describes the 'single binding action (SBA) algorithm' in text but does not include a structured pseudocode block or algorithm box. |
| Open Source Code | Yes | Code is available at: https://github.com/edensaig/delegated-classification. |
| Open Datasets | Yes | We base our experiments on the recently curated Learning Curves Database (LCDB) [43], which includes a large collection of stochastic learning curves for multiple classification datasets and methods. Here we focus primarily on the popular MNIST dataset [39] as our case study... |
| Dataset Splits | Yes | expected performance is estimated by the empirical average on an additional held-out validation set V Dm of size m, as acc V (h) = 1 m Pm i=1 1 [h(xi) = yi], which is a consistent and unbiased estimator of acc D(h). For each trained classifier, each accuracy point on the learning curve is estimated using 5,000 held-out samples. |
| Hardware Specification | Yes | All experiments were run on a single laptop, with 16GB of RAM, M1 Pro processor, and with no GPU support. |
| Software Dependencies | No | The paper mentions software like Pyomo, GLPK, and scikit-learn, but does not specify their version numbers for reproducibility. |
| Experiment Setup | Yes | action costs are set to fixed per-unit cost, i.e., cn = n; and (iii) the distribution F over outcomes β¦is associated with a binomial mixture distribtuion, resulting from applying bootstrap sampling to empirical error measurements: 1 R P r=1 Binomial(m, ar,Alg,D n ). In particular, we experiment with fitting parametric power-law curves of the form E[Ξ±n] = a bn c, which have been shown to provide good fit in various scenarios both empirically and theoretically [49, 34, 11]. We define r as the number of samples per n (so low r means larger n0). Then, for a given r, we set n0 such that Pn n0 r n k (i.e., such that the total number of used samples does not exceed k). |