Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Concentration and excess risk bounds for imbalanced classification with synthetic oversampling

Authors: Touqeer Ahmad, Mohammadreza Mousavi Kalan, Franรงois Portier, Gilles Stupfler

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments are provided to illustrate and support the theoretical findings.
Researcher Affiliation Academia 1Univ Rennes, Ensai, CNRS, CREST UMR 9194, F-35000 Rennes, France 2Univ Angers, CNRS, LAREMA, SFR MATHSTIC, F-49000 Angers, France
Pseudocode Yes Algorithm 1 SMOTE Algorithm 2 KDEO
Open Source Code Yes While the code is hosted on a Git Hub repository, the link is withheld to maintain anonymity for the review process. Nonetheless, the paper includes all necessary implementation and evaluation details relevant to its main claims.
Open Datasets Yes Real datasets used are publicly available through the Open ML and UCI machine learning repositories.
Dataset Splits Yes Each dataset is split into training and validation sets using a 70 : 30 ratio.
Hardware Specification No The paper does not provide detailed information about the computational resources used for the experiments (e.g., type of compute workers, memory, or execution time).
Software Dependencies No The paper mentions 'imbalanced-learn: A Python toolbox' [Lemaitre et al., 2017] but does not specify its version or the versions of other software components used for the experiments.
Experiment Setup Yes For SMOTE, we consider the default choice k = 5 neighbors. For KDEO, we consider the matrix-valued bandwidth H1 such that H2 1 = n 2/(d+4) 1 C1, following from Scott s rule, where C1 is the covariance matrix computed from the minority class samples. ... Then apply the KNN (resp. KS) algorithm with parameter K = n (resp. s = Sj obtained from Scott s rule S2 j = n0 2/(d+4)Cj, j = 0, 1, where Cj is the covariance matrix of class j).