Towards Class-Imbalance Aware Multi-Label Learning

Authors: Min-Ling Zhang, Yu-Kun Li, Xu-Ying Liu

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments clearly validate the effectiveness of the proposed approach, especially in terms of imbalance-specific evaluation metrics such as F-measure and area under the ROC curve. Comparative studies across thirteen publicly-available multi-label data sets show that COCOA achieves highly competitive performance, especially in terms of appropriate evaluation metrics under class-imbalance scenario.
Researcher Affiliation Academia Min-Ling Zhang Yu-Kun Li Xu-Ying Liu 1School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education, China {zhangml, liyk, liuxy}@seu.edu.cn
Pseudocode Yes Table 1: The pseudo-code of COCOA.
Open Source Code No The paper discusses using existing open-source libraries like Weka and MULAN but does not provide a statement or link for the open-sourcing of the COCOA methodology developed in the paper.
Open Datasets Yes To serve as a solid basis for performance evaluation, a total of thirteen benchmark multi-label data sets have been collected for experimental studies. Table 2 summarizes characteristics of the experimental data sets, which are roughly ordered according to |S|. As shown in Table 2, the thirteen data sets exhibit diversified properties from different aspects. These data sets cover a broad range of scenarios, including music (CAL500, Emotions), image (Scene, Corel5k), video (Mediamill), biology (Yeast), and text (the others).
Dataset Splits Yes Each data set is randomly split for training and testing, where 50% examples are chosen to form the training set and the remaining ones form the test set. The random train/test splits are repeated for ten times and the mean metric value as well as the standard deviation are recorded.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions "Weka platform with J48 decision tree (C4.5 implementation in Weka)" and "MULAN multi-label learning library (upon Weka platform)", but does not provide specific version numbers for these software components.
Experiment Setup Yes For COCOA, both the binary-class and multi-class imbalance learners (B and M) are implemented in Weka using J48 decision tree with undersampling [Hall et al., 2009]. Furthermore, the number of coupling class labels is set as K = min(q 1, 10). For fair comparison, the ensemble size for USAM-EN and SMOTE-EN is also set to be 10.