Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Under-bagging Nearest Neighbors for Imbalanced Classification

Authors: Hanyuan Hang, Yuchao Cai, Hanfang Yang, Zhouchen Lin

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On the practical side, we conduct numerical experiments to verify the theoretical results on the beneﬁts of the under-bagging technique by the promising AM performance and eﬃciency of our proposed algorithm.
Researcher Affiliation	Academia	Hanyuan Hang EMAIL Department of Applied Mathematics, University of Twente 7522 NB Enschede, The Netherlands Yuchao Cai EMAIL School of Statistics, Renmin University of China 100872 Beijing, China Hanfang Yang EMAIL Center for Applied Statistics, School of Statistics, Renmin University of China 100872 Beijing, China Zhouchen Lin EMAIL Key Lab. of Machine Perception (Mo E), School of Artiﬁcial Intelligence, Peking University Institute for Artiﬁcial Intelligence, Peking University, 100872 Beijing, China Peng Cheng Laboratory, 518055 Shenzhen, Guangdong, China
Pseudocode	Yes	Algorithm 1: Under-bagging k-NN Classiﬁer for Imbalanced Classiﬁcation
Open Source Code	No	The text does not provide any explicit statement or link for the source code of the methodology described in the paper. It only mentions using third-party implementations like scikit-learn and imbalanced-learn.
Open Datasets	Yes	These imbalanced datasets comes from the UCI Machine Learning Repository (Dua and Graﬀ, 2017).
Dataset Splits	Yes	We generate 20, 000 positive samples and 200 negative samples in each run for training, and 200, 000 positive samples and 2, 000 negative samples for testing.
Hardware Specification	Yes	All experiments are conducted on a 64-bit machine with 24-cores Intel Xeon 2.0GHz CPU (E5-4620) and 64GB main memory.
Software Dependencies	No	The paper mentions using "scikit-learn" and "imbalanced-learn" implementations in Python, but does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	There are three hyper-parameters in the under-bagging k-NN for imbalanced classiﬁcation: the bagging rounds B, the number of nearest neighbors k, and the expectation of subsample size s. ... we ﬁx the expected sub-sample size s = Mn(1), and vary the bagging rounds B {1, 2, 5, 10, 20, 50} and the number of neighbors k {1, 2, . . . , 30}. ... we ﬁx the bagging rounds B = 20, and explore the performance under diﬀerent sub-sample size s = a Mn(1) with a {0.2, 0.4, . . . , 1.0}.