reproducibilityindex.ai

A Novel Model for Imbalanced Data Classification

Authors: Jian Yin, Chunjing Gan, Kaiqi Zhao, Xuan Lin, Zhe Quan, Zhi-Jie Wang6680-6687

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the performance of our proposed model, we have conducted experiments based on 14 public datasets. The results show that our model outperforms the state-of-the-art methods in terms of recall, G-mean, F-measure and AUC.
Researcher Affiliation	Academia	1School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, China 2School of Computer Science, University of Auckland, Auckland, New Zealand 3College of Information Science and Engineering, Hunan University, Changsha, China 4Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou, China 5National Engineering Laboratory for Big Data Analysis and Applications, Beijing, China
Pseudocode	Yes	Algorithm 1 shows the pseudocodes of data block construction process. ... Algorithm 1: Data blocks construction
Open Source Code	No	The paper mentions datasets from "Open ML" and "KEEL repository" which are external sources, but does not provide any link or statement about the availability of their own source code for the proposed method.
Open Datasets	Yes	In our experiments, we employ 14 widely used datasets. Among them, six datasets (including cm1, kc3, mw1, pc1, pc3, pc4) are from Open ML (Vanschoren et al. 2013), and they are open datasets for software defect detection. The other eight datasets (including yeast1vs7, abalone9vs18, yeast6, abalone19, poker89vs6, wine3vs5, abalone20, and poker8vs6) are from KEEL repository (Fern andez, del Jesus, and Herrera 2009).
Dataset Splits	No	In addition, similar to previous works, for all experiments we randomly split datasets into two parts: training set (70%) and test set (30%). The paper specifies a training and test split but does not mention a separate validation split percentage or details.
Hardware Specification	No	The paper does not provide any specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions machine learning techniques and classifiers (e.g., k NN, SVM, neural networks) but does not provide specific software names with version numbers (e.g., Python 3.x, PyTorch 1.x) that were used for implementation or experimentation.
Experiment Setup	Yes	Parameter settings. ... For our model, we tune each parameter by ﬁxing the others. Speciﬁcally, we vary λ within the range of [0.1, 1] and select the values that lead to the best performance. Similarly, the number of blocks can be either IR or IR and we set the value that achieves the best performance. Besides, we set γ to 0.2 for all datasets and the cost ratio to IR .