A Novel Model for Imbalanced Data Classification
Authors: Jian Yin, Chunjing Gan, Kaiqi Zhao, Xuan Lin, Zhe Quan, Zhi-Jie Wang6680-6687
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the performance of our proposed model, we have conducted experiments based on 14 public datasets. The results show that our model outperforms the state-of-the-art methods in terms of recall, G-mean, F-measure and AUC. |
| Researcher Affiliation | Academia | 1School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, China 2School of Computer Science, University of Auckland, Auckland, New Zealand 3College of Information Science and Engineering, Hunan University, Changsha, China 4Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou, China 5National Engineering Laboratory for Big Data Analysis and Applications, Beijing, China |
| Pseudocode | Yes | Algorithm 1 shows the pseudocodes of data block construction process. ... Algorithm 1: Data blocks construction |
| Open Source Code | No | The paper mentions datasets from "Open ML" and "KEEL repository" which are external sources, but does not provide any link or statement about the availability of their own source code for the proposed method. |
| Open Datasets | Yes | In our experiments, we employ 14 widely used datasets. Among them, six datasets (including cm1, kc3, mw1, pc1, pc3, pc4) are from Open ML (Vanschoren et al. 2013), and they are open datasets for software defect detection. The other eight datasets (including yeast1vs7, abalone9vs18, yeast6, abalone19, poker89vs6, wine3vs5, abalone20, and poker8vs6) are from KEEL repository (Fern andez, del Jesus, and Herrera 2009). |
| Dataset Splits | No | In addition, similar to previous works, for all experiments we randomly split datasets into two parts: training set (70%) and test set (30%). The paper specifies a training and test split but does not mention a separate validation split percentage or details. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions machine learning techniques and classifiers (e.g., k NN, SVM, neural networks) but does not provide specific software names with version numbers (e.g., Python 3.x, PyTorch 1.x) that were used for implementation or experimentation. |
| Experiment Setup | Yes | Parameter settings. ... For our model, we tune each parameter by fixing the others. Specifically, we vary λ within the range of [0.1, 1] and select the values that lead to the best performance. Similarly, the number of blocks can be either IR or IR and we set the value that achieves the best performance. Besides, we set γ to 0.2 for all datasets and the cost ratio to IR . |