Multiset Feature Learning for Highly Imbalanced Data Classification
Authors: Fei Wu, Xiao-Yuan Jing, Shiguang Shan, Wangmeng Zuo, Jing-Yu Yang
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on five highly imbalanced datasets indicate that: UCML outperforms state-of-the-art imbalanced learning methods. |
| Researcher Affiliation | Academia | 1 State Key Laboratory of Software Engineering, School of Computer, Wuhan University, China 2 College of Automation, Nanjing University of Posts and Telecommunications, China 3 Key Lab of Intelligent Information Process of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, China 4 School of Computer, Harbin Institute of Technology, China 5 College of Computer Science and Technology, Nanjing University of Science and Technology, China |
| Pseudocode | No | The paper describes the objective function and solution steps mathematically but does not present them in pseudocode or algorithm block format. |
| Open Source Code | No | No explicit statement about open-sourcing code or a link to a code repository. |
| Open Datasets | Yes | Table 1 shows properties of five highly imbalanced datasets derived from various application fields (Menzies et al. 2007; Alcalá-Fdez et al. 2011). We can see that the majority class samples outnumber the minority class samples severely. 1 UC Irvine Machine Learning Repository, http://archive.ics.uci.edu/ml/, 2009. |
| Dataset Splits | Yes | In experiments, we randomly select 50% samples to construct the training set for all datasets, and use the remained samples for testing. We repeat random selection 20 times and record the average results. The parameter V^2 in the weighted uncorrelated constraint is set by using 5-fold cross validation on the training set. |
| Hardware Specification | No | No specific hardware details (e.g., CPU, GPU models, memory) used for running experiments are mentioned. |
| Software Dependencies | No | No specific software dependencies with version numbers are mentioned. |
| Experiment Setup | Yes | Assume that the first class is the majority class and the second class is the minority class. Then 1,2 cost and 2,1 cost are separately set as 1,2 1 cost and 1 2 2,1 rounded value of cost N N , where 1 N and 2 N denote the numbers of majority and minority class samples. The parameter 2 V in the weighted uncorrelated constraint is set by using 5-fold cross validation on the training set. For the parameter 2 V in our approach, we search the parameter space 3 2 1 0 1 2 3 2 0 2 ,2 ,2 ,2 ,2 ,2 ,2 V ª º u ¼ , where 2 0 V is the mean square distance of training data. For simplicity, we set 2 V as 2 0 2V on PC1. A similar phenomenon also exists on the other datasets. For the 1,2,..., thj j v set, we firstly use the nearest neighbor (NN) classifier with the cosine distance to classify j T Z on j X Z . Then we can obtain v predicted results for each testing sample in 7 . Next, we can adopt the majority voting strategy to make final decision for each test sample. |