Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Under-bagging Nearest Neighbors for Imbalanced Classification
Authors: Hanyuan Hang, Yuchao Cai, Hanfang Yang, Zhouchen Lin
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On the practical side, we conduct numerical experiments to verify the theoretical results on the benefits of the under-bagging technique by the promising AM performance and efficiency of our proposed algorithm. |
| Researcher Affiliation | Academia | Hanyuan Hang EMAIL Department of Applied Mathematics, University of Twente 7522 NB Enschede, The Netherlands Yuchao Cai EMAIL School of Statistics, Renmin University of China 100872 Beijing, China Hanfang Yang EMAIL Center for Applied Statistics, School of Statistics, Renmin University of China 100872 Beijing, China Zhouchen Lin EMAIL Key Lab. of Machine Perception (Mo E), School of Artificial Intelligence, Peking University Institute for Artificial Intelligence, Peking University, 100872 Beijing, China Peng Cheng Laboratory, 518055 Shenzhen, Guangdong, China |
| Pseudocode | Yes | Algorithm 1: Under-bagging k-NN Classifier for Imbalanced Classification |
| Open Source Code | No | The text does not provide any explicit statement or link for the source code of the methodology described in the paper. It only mentions using third-party implementations like scikit-learn and imbalanced-learn. |
| Open Datasets | Yes | These imbalanced datasets comes from the UCI Machine Learning Repository (Dua and Graff, 2017). |
| Dataset Splits | Yes | We generate 20, 000 positive samples and 200 negative samples in each run for training, and 200, 000 positive samples and 2, 000 negative samples for testing. |
| Hardware Specification | Yes | All experiments are conducted on a 64-bit machine with 24-cores Intel Xeon 2.0GHz CPU (E5-4620) and 64GB main memory. |
| Software Dependencies | No | The paper mentions using "scikit-learn" and "imbalanced-learn" implementations in Python, but does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | There are three hyper-parameters in the under-bagging k-NN for imbalanced classification: the bagging rounds B, the number of nearest neighbors k, and the expectation of subsample size s. ... we fix the expected sub-sample size s = Mn(1), and vary the bagging rounds B {1, 2, 5, 10, 20, 50} and the number of neighbors k {1, 2, . . . , 30}. ... we fix the bagging rounds B = 20, and explore the performance under different sub-sample size s = a Mn(1) with a {0.2, 0.4, . . . , 1.0}. |