Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
PriorBoost: An Adaptive Algorithm for Learning from Aggregate Responses
Authors: Adel Javanmard, Matthew Fahrbach, Vahab Mirrokni
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study Prior Boost through extensive experiments in Section 7. This includes a comparison with random bagging for linear and logistic regression tasks, as well as a careful exploration into label differential privacy with Laplace noise for different privacy budgets. |
| Researcher Affiliation | Collaboration | 1University of Southern California 2Google Research. |
| Pseudocode | Yes | We give pseudocode for Prior Boost in Algorithm 1 |
| Open Source Code | No | The paper does not provide an explicit statement or link to the open-source code for the methodology described. |
| Open Datasets | No | The paper describes generating synthetic datasets for its experiments rather than using a publicly available or open dataset. For example: "We start by generating a dataset (X, y) with X Rn d as follows. First, sample a ground truth model θ Nd(0, I). Next, generate a design matrix X of n i.i.d. feature vectors xi Nd(0, I) and get their responses y = Xθ +ε, where each εi N(0, σ2) is i.i.d. Gaussian noise with σ = 0.1." |
| Dataset Splits | No | The paper specifies the generation of a test set but does not explicitly mention distinct training/validation/test splits or their percentages for reproducibility. It discusses training data and test data without detailing a validation split. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | Our experiments use Num Py (Harris et al., 2020) and scikit-learn s Logistic Regression (Pedregosa et al., 2011). No specific version numbers for these software components are provided. |
| Experiment Setup | Yes | To study the convergence of Prior Boost and PBPrefix, we set T = 256. Then we set n = T 4096 = 2^20 and d = 8. ... All three algorithms fit logistic regression models with binary cross-entropy loss and L2 regularization penalty λ = 10. |