Towards Class-Imbalance Aware Multi-Label Learning
Authors: Min-Ling Zhang, Yu-Kun Li, Xu-Ying Liu
IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments clearly validate the effectiveness of the proposed approach, especially in terms of imbalance-specific evaluation metrics such as F-measure and area under the ROC curve. Comparative studies across thirteen publicly-available multi-label data sets show that COCOA achieves highly competitive performance, especially in terms of appropriate evaluation metrics under class-imbalance scenario. |
| Researcher Affiliation | Academia | Min-Ling Zhang Yu-Kun Li Xu-Ying Liu 1School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education, China {zhangml, liyk, liuxy}@seu.edu.cn |
| Pseudocode | Yes | Table 1: The pseudo-code of COCOA. |
| Open Source Code | No | The paper discusses using existing open-source libraries like Weka and MULAN but does not provide a statement or link for the open-sourcing of the COCOA methodology developed in the paper. |
| Open Datasets | Yes | To serve as a solid basis for performance evaluation, a total of thirteen benchmark multi-label data sets have been collected for experimental studies. Table 2 summarizes characteristics of the experimental data sets, which are roughly ordered according to |S|. As shown in Table 2, the thirteen data sets exhibit diversified properties from different aspects. These data sets cover a broad range of scenarios, including music (CAL500, Emotions), image (Scene, Corel5k), video (Mediamill), biology (Yeast), and text (the others). |
| Dataset Splits | Yes | Each data set is randomly split for training and testing, where 50% examples are chosen to form the training set and the remaining ones form the test set. The random train/test splits are repeated for ten times and the mean metric value as well as the standard deviation are recorded. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions "Weka platform with J48 decision tree (C4.5 implementation in Weka)" and "MULAN multi-label learning library (upon Weka platform)", but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | For COCOA, both the binary-class and multi-class imbalance learners (B and M) are implemented in Weka using J48 decision tree with undersampling [Hall et al., 2009]. Furthermore, the number of coupling class labels is set as K = min(q 1, 10). For fair comparison, the ensemble size for USAM-EN and SMOTE-EN is also set to be 10. |