Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Concentration and excess risk bounds for imbalanced classification with synthetic oversampling
Authors: Touqeer Ahmad, Mohammadreza Mousavi Kalan, Franรงois Portier, Gilles Stupfler
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments are provided to illustrate and support the theoretical findings. |
| Researcher Affiliation | Academia | 1Univ Rennes, Ensai, CNRS, CREST UMR 9194, F-35000 Rennes, France 2Univ Angers, CNRS, LAREMA, SFR MATHSTIC, F-49000 Angers, France |
| Pseudocode | Yes | Algorithm 1 SMOTE Algorithm 2 KDEO |
| Open Source Code | Yes | While the code is hosted on a Git Hub repository, the link is withheld to maintain anonymity for the review process. Nonetheless, the paper includes all necessary implementation and evaluation details relevant to its main claims. |
| Open Datasets | Yes | Real datasets used are publicly available through the Open ML and UCI machine learning repositories. |
| Dataset Splits | Yes | Each dataset is split into training and validation sets using a 70 : 30 ratio. |
| Hardware Specification | No | The paper does not provide detailed information about the computational resources used for the experiments (e.g., type of compute workers, memory, or execution time). |
| Software Dependencies | No | The paper mentions 'imbalanced-learn: A Python toolbox' [Lemaitre et al., 2017] but does not specify its version or the versions of other software components used for the experiments. |
| Experiment Setup | Yes | For SMOTE, we consider the default choice k = 5 neighbors. For KDEO, we consider the matrix-valued bandwidth H1 such that H2 1 = n 2/(d+4) 1 C1, following from Scott s rule, where C1 is the covariance matrix computed from the minority class samples. ... Then apply the KNN (resp. KS) algorithm with parameter K = n (resp. s = Sj obtained from Scott s rule S2 j = n0 2/(d+4)Cj, j = 0, 1, where Cj is the covariance matrix of class j). |