Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Under-bagging Nearest Neighbors for Imbalanced Classification

Authors: Hanyuan Hang, Yuchao Cai, Hanfang Yang, Zhouchen Lin

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On the practical side, we conduct numerical experiments to verify the theoretical results on the benefits of the under-bagging technique by the promising AM performance and efficiency of our proposed algorithm.
Researcher Affiliation Academia Hanyuan Hang EMAIL Department of Applied Mathematics, University of Twente 7522 NB Enschede, The Netherlands Yuchao Cai EMAIL School of Statistics, Renmin University of China 100872 Beijing, China Hanfang Yang EMAIL Center for Applied Statistics, School of Statistics, Renmin University of China 100872 Beijing, China Zhouchen Lin EMAIL Key Lab. of Machine Perception (Mo E), School of Artificial Intelligence, Peking University Institute for Artificial Intelligence, Peking University, 100872 Beijing, China Peng Cheng Laboratory, 518055 Shenzhen, Guangdong, China
Pseudocode Yes Algorithm 1: Under-bagging k-NN Classifier for Imbalanced Classification
Open Source Code No The text does not provide any explicit statement or link for the source code of the methodology described in the paper. It only mentions using third-party implementations like scikit-learn and imbalanced-learn.
Open Datasets Yes These imbalanced datasets comes from the UCI Machine Learning Repository (Dua and Graff, 2017).
Dataset Splits Yes We generate 20, 000 positive samples and 200 negative samples in each run for training, and 200, 000 positive samples and 2, 000 negative samples for testing.
Hardware Specification Yes All experiments are conducted on a 64-bit machine with 24-cores Intel Xeon 2.0GHz CPU (E5-4620) and 64GB main memory.
Software Dependencies No The paper mentions using "scikit-learn" and "imbalanced-learn" implementations in Python, but does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes There are three hyper-parameters in the under-bagging k-NN for imbalanced classification: the bagging rounds B, the number of nearest neighbors k, and the expectation of subsample size s. ... we fix the expected sub-sample size s = Mn(1), and vary the bagging rounds B {1, 2, 5, 10, 20, 50} and the number of neighbors k {1, 2, . . . , 30}. ... we fix the bagging rounds B = 20, and explore the performance under different sub-sample size s = a Mn(1) with a {0.2, 0.4, . . . , 1.0}.