Multiset Feature Learning for Highly Imbalanced Data Classification

Authors: Fei Wu, Xiao-Yuan Jing, Shiguang Shan, Wangmeng Zuo, Jing-Yu Yang

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on five highly imbalanced datasets indicate that: UCML outperforms state-of-the-art imbalanced learning methods.
Researcher Affiliation Academia 1 State Key Laboratory of Software Engineering, School of Computer, Wuhan University, China 2 College of Automation, Nanjing University of Posts and Telecommunications, China 3 Key Lab of Intelligent Information Process of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, China 4 School of Computer, Harbin Institute of Technology, China 5 College of Computer Science and Technology, Nanjing University of Science and Technology, China
Pseudocode No The paper describes the objective function and solution steps mathematically but does not present them in pseudocode or algorithm block format.
Open Source Code No No explicit statement about open-sourcing code or a link to a code repository.
Open Datasets Yes Table 1 shows properties of five highly imbalanced datasets derived from various application fields (Menzies et al. 2007; Alcalá-Fdez et al. 2011). We can see that the majority class samples outnumber the minority class samples severely. 1 UC Irvine Machine Learning Repository, http://archive.ics.uci.edu/ml/, 2009.
Dataset Splits Yes In experiments, we randomly select 50% samples to construct the training set for all datasets, and use the remained samples for testing. We repeat random selection 20 times and record the average results. The parameter V^2 in the weighted uncorrelated constraint is set by using 5-fold cross validation on the training set.
Hardware Specification No No specific hardware details (e.g., CPU, GPU models, memory) used for running experiments are mentioned.
Software Dependencies No No specific software dependencies with version numbers are mentioned.
Experiment Setup Yes Assume that the first class is the majority class and the second class is the minority class. Then 1,2 cost and 2,1 cost are separately set as 1,2 1 cost and 1 2 2,1 rounded value of cost N N , where 1 N and 2 N denote the numbers of majority and minority class samples. The parameter 2 V in the weighted uncorrelated constraint is set by using 5-fold cross validation on the training set. For the parameter 2 V in our approach, we search the parameter space 3 2 1 0 1 2 3 2 0 2 ,2 ,2 ,2 ,2 ,2 ,2 V ª º u ¼ , where 2 0 V is the mean square distance of training data. For simplicity, we set 2 V as 2 0 2V on PC1. A similar phenomenon also exists on the other datasets. For the 1,2,..., thj j v set, we firstly use the nearest neighbor (NN) classifier with the cosine distance to classify j T Z on j X Z . Then we can obtain v predicted results for each testing sample in 7 . Next, we can adopt the majority voting strategy to make final decision for each test sample.