MEPSI: An MDL-Based Ensemble Pruning Approach with Structural Information
Authors: Xiao-Dong Bi, Shao-Qun Zhang, Yuan Jiang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The comparative experiments conducted on multiple real-world data sets demonstrate the effectiveness of our proposed method. ... We conduct experiments to compare MEPSI with 13 ensemble pruning methods on 11 classification data sets. The numerical results show that our method achieves good results and sufficiently outperforms other methods on average. |
| Researcher Affiliation | Academia | 1 National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China 2 School of Artificial Intelligence, Nanjing University, Nanjing 210023, China 3 School of Intelligent Science and Technology, Nanjing University, Suzhou 215163, China {bixd, zhangsq, jiangy}@lamda.nju.edu.cn |
| Pseudocode | Yes | Algorithm 1: The Tree-based Implementation for MEPSI |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described. |
| Open Datasets | Yes | The tree-based MEPSI is evaluated on 11 real-world image data sets and tabular data sets, including the scikit-learn digits (Pedregosa et al. 2011), USPS (Hull 1994), and multiple UCI data sets (Kelly, Longjohn, and Nottingham 2017), which are most frequently used in ensemble learning and ensemble pruning (Li and Zhou 2009; Rodriguez, Kuncheva, and Alonso 2006; Sun and Zhou 2018; Zhou and Tang 2003; Zhou and Feng 2019). |
| Dataset Splits | Yes | The default methods to split train and test data are employed if the data sets have been split into the training and testing parts. Otherwise, we randomly split the data for training and testing. The detailed split configurations of data sets are shown in the appendix. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'implementation of scikit-learn’s random forest (Pedregosa et al. 2011)' but does not provide specific version numbers for scikit-learn or any other software dependencies. |
| Experiment Setup | Yes | For the generation of decision trees, we employ both bootstrap sampling and random feature selection approaches to train 200 CART decision trees (Breiman 2017) and select 20 of them to combine and make predictions. The trees are generated according to the implementation of scikit-learn’s random forest (Pedregosa et al. 2011), and the detailed settings and hyperparameters of random forests are shown in the appendix. For the hyperparameter of Algorithm 1, we also show the setting of the trade-off weight λ in the appendix. |