Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Robust Minimax Boosting with Performance Guarantees

Authors: Santiago Mazuelas, Veronica Alvarez

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results corroborate that RMBoost is not only resilient to label noise but can also provide strong classiﬁcation accuracy. [...] The experiments show that RMBoost can outperform existing methods in the presence of noisy labels and also achieve strong classiﬁcation accuracies without noise.
Researcher Affiliation	Academia	1Basque Center of Applied Mathematics (BCAM) 2Massachusetts Institute of Technology (MIT) 3IKERBASQUE-Basque Foundation for Science EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 RMBoost learning algorithm
Open Source Code	Yes	The code implementing the methods presented and reproducing the experiments can be found at https://github.com/Machine Learning BCAM/RMBoost-Neur IPS-2025. The supplementary materials provide additional details and results in Appendix H, including running times assessments and the results of all the boosting methods in all label noise cases.
Open Datasets	Yes	We utilize 11 publicly available datasets that have been often use as benchmark for boosting methods: Diabetes, German Numer, Credit, Blood transfusion, Titanic, Raisin, QSAR, Climate, Susy, Higgs, and Forest covertype. These datasets can be found in the UCI repository [41] and in www.kaggle.com. [...] [41] Dheeru Dua and Casey Graff. UCI Machine Learning Repository, 2017.
Dataset Splits	Yes	The results in Table 1 in the paper as well as Table 3 below are obtained carrying out 100 random and stratiﬁed train/test partitions with 10% test samples. [...] Figures 4a and 4b are obtained computing for each noise level the classiﬁcation error over 500 random stratiﬁed partitions with 10% test samples.
Hardware Specification	No	The text mentions "the absolute running times in all the methods are in the order of seconds in a regular desktop machine" but does not specify any particular hardware components like CPU, GPU, or memory models.
Software Dependencies	No	Methods Robust Boost, Ada Boost, Logit Boost, Gentle Boost, and LPBoost are implemented using their Matlab codes, methods XGB-Quad and Brown Boost are implemented using the Python libraries XGBoost https://xgboost.readthedocs.io and Brown Boost https://github.com/lapis-zero09/Brown Boost, respectively, and method Robust-GBDT is implemented using the code provided by the authors [38]. While programming languages and libraries are mentioned, specific version numbers are not provided for any of them.
Experiment Setup	Yes	Input: Training samples {(xi, yi)}n i=1, parameters λ, K [...] In particular, we use simplex-based solvers for linear optimization with tolerances for constraints and dual feasibility of 10^-3, and we take λ = 1/sqrt(n) in all the numerical results.