Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Revisiting Agnostic Boosting
Authors: Arthur da Cunha, Mikael Møller Høgsgaard, Andrea Paudice, Yuxin Sun
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we propose a new agnostic boosting algorithm with substantially improved sample complexity compared to prior works under very general assumptions. Our approach is based on a reduction to the realizable case, followed by a margin-based filtering of high-quality hypotheses. Furthermore, we show a nearly-matching lower bound, settling the sample complexity of agnostic boosting up to logarithmic factors. |
| Researcher Affiliation | Academia | Arthur da Cunha Aarhus University EMAIL Mikael Møller Høgsgaard Aarhus University EMAIL Andrea Paudice Aarhus University EMAIL Yuxin Sun Aarhus University EMAIL |
| Pseudocode | Yes | Algorithm 1: Modified ADABOOST Algorithm 2: Agnostic boosting algorithm |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code, nor does it provide links to any code repositories for the described methodology. The paper primarily focuses on the theoretical aspects of the algorithms. |
| Open Datasets | No | The paper refers to abstract concepts like 'input space X', 'distribution D over X Y', and 'training sequence S = (x1, f(x1)), . . . , (xm, f(xm))' for its theoretical analysis. It does not mention or use any specific publicly available or open datasets for empirical evaluation. |
| Dataset Splits | No | The paper refers to splitting a generic training sequence 'S' into 'S1, S2, and S3' for its algorithmic steps ('Let S1, S2, and S3 be the first, second, and third thirds of S, respectively'). However, this is in the context of theoretical analysis of the algorithm itself, not for empirical evaluation using a concrete dataset. No specific dataset splits (percentages, sample counts, or predefined benchmark splits) are provided for experimental reproduction. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run experiments. This is consistent with the paper's theoretical nature, which focuses on sample complexity and algorithmic properties rather than empirical performance. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks, or solvers) that would be needed to replicate experiments. This is expected given the theoretical nature of the work, which focuses on mathematical proofs and algorithmic design rather than empirical implementation details. |
| Experiment Setup | No | The paper does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs), optimizer settings, or model initialization strategies. The focus is on the theoretical properties and bounds of the proposed boosting algorithm. |