MESA: Boost Ensemble Imbalanced Learning with MEta-SAmpler

Authors: Zhining Liu, Pengfei Wei, Jing Jiang, Wei Cao, Jiang Bian, Yi Chang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on both synthetic and realworld tasks demonstrate the effectiveness, robustness, and transferability of MESA. Our code is available at https://github.com/Zhining Liu1998/mesa. 4 Experiments To thoroughly assess the effectiveness of MESA, two series of experiments are conducted: one on controlled synthetic toy datasets for visualization and the other on real-world imbalanced datasets to validate MESA s performance in practical applications.
Researcher Affiliation Collaboration Zhining Liu Jilin University znliu19@mails.jlu.edu.cn Pengfei Wei National University of Singapore dcsweip@nus.edu.sg Jing Jiang University of Technology Sydney jing.jiang@uts.edu.au Wei Cao Microsoft Research weicao@microsoft.com Jiang Bian Microsoft Research jiang.bian@microsoft.com Yi Chang Jilin University yichang@jlu.edu.cn
Pseudocode Yes Algorithm 1 Sample(Dτ; F, µ, σ) Algorithm 2 Ensemble training in MESA Algorithm 3 Meta-training in MESA
Open Source Code Yes Our code is available at https://github.com/Zhining Liu1998/mesa.
Open Datasets Yes We extend the experiments to real-world imbalanced classification tasks from the UCI repository [10] and KDD CUP 2004. For each dataset, we keep-out the 20% validation set and report the result of 4-fold stratified cross-validation (i.e., 60%/20%/20% training/validation/test split).
Dataset Splits Yes For each dataset, we keep-out the 20% validation set and report the result of 4-fold stratified cross-validation (i.e., 60%/20%/20% training/validation/test split).
Hardware Specification No The paper does not provide any specific details regarding the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes Setup Details. We build a series of imbalanced toy datasets corresponding to different levels of underlying class distribution overlapping, as shown in Fig. 3. All the datasets have the same imbalance ratio (|N|/|P| = 2, 000/200 = 10). In this experiment, MESA is compared with four representative EIL algorithms from 4 major EIL branches (Parallel/Iterative Ensemble + Under/Over-sampling), i.e., SMOTEBOOST [7], SMOTEBAGGING [42], RUSBOOST [35], and UNDERBAGGING [2]. All EIL methods are deployed with decision trees as base classifiers with ensemble size of 5.