Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Concentration and excess risk bounds for imbalanced classification with synthetic oversampling

Authors: Touqeer Ahmad, Mohammadreza Mousavi Kalan, François Portier, Gilles Stupfler

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments are provided to illustrate and support the theoretical findings.
Researcher Affiliation	Academia	1Univ Rennes, Ensai, CNRS, CREST UMR 9194, F-35000 Rennes, France 2Univ Angers, CNRS, LAREMA, SFR MATHSTIC, F-49000 Angers, France
Pseudocode	Yes	Algorithm 1 SMOTE Algorithm 2 KDEO
Open Source Code	Yes	While the code is hosted on a Git Hub repository, the link is withheld to maintain anonymity for the review process. Nonetheless, the paper includes all necessary implementation and evaluation details relevant to its main claims.
Open Datasets	Yes	Real datasets used are publicly available through the Open ML and UCI machine learning repositories.
Dataset Splits	Yes	Each dataset is split into training and validation sets using a 70 : 30 ratio.
Hardware Specification	No	The paper does not provide detailed information about the computational resources used for the experiments (e.g., type of compute workers, memory, or execution time).
Software Dependencies	No	The paper mentions 'imbalanced-learn: A Python toolbox' [Lemaitre et al., 2017] but does not specify its version or the versions of other software components used for the experiments.
Experiment Setup	Yes	For SMOTE, we consider the default choice k = 5 neighbors. For KDEO, we consider the matrix-valued bandwidth H1 such that H2 1 = n 2/(d+4) 1 C1, following from Scott s rule, where C1 is the covariance matrix computed from the minority class samples. ... Then apply the KNN (resp. KS) algorithm with parameter K = n (resp. s = Sj obtained from Scott s rule S2 j = n0 2/(d+4)Cj, j = 0, 1, where Cj is the covariance matrix of class j).