A Framework of Online Learning with Imbalanced Streaming Data

Authors: Yan Yan, Tianbao Yang, Yi Yang, Jianhui Chen

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical studies demonstrate the competitive if not better performance of the proposed method compared to previous cost-sensitive and resampling based online learning algorithms and those that are designed for optimizing certain measures. In this section, we evaluate OMCSL for optimizing three measures, F-measure, AUROC and AUPRC, and compare with competing online learning algorithms on three public imbalanced datasets.
Researcher Affiliation Collaboration Yan Yan,1 Tianbao Yang,2 Yi Yang,1 Jianhui Chen3 1QCIS, University of Technology Sydney, 15 Broadway, Ultimo NSW 2007, Australia 2Department of Computer Science, The University of Iowa, Iowa City, IA 52242, USA 3Yahoo! Labs, Sunnyvale, CA 94089, USA
Pseudocode Yes Algorithm 1 A Framework of Online Multiple Cost-sensitive Learning
Open Source Code No The paper does not provide any concrete access information (e.g., repository links, explicit statements of code release) for the source code of the described methodology.
Open Datasets Yes We compare the proposed OMCSL method with several state of the art online learning algorithms... on three public imbalanced datasets. Table 2 lists the statistics of used three datasets. To construct imbalanced data from multiclass datasets covtype, we sample instances of the fifth class as positive and instances of the first class as negative, denoted by covtype1v5. Similarly, for aloi, we sample instances of the first class as positive, and the rest as negative, denoted by aloi-1.
Dataset Splits No For each dataset, we randomly sample 4/5 instances as the training set and the rest 1/5 as the testing set. The paper specifies training and testing splits but does not mention a separate validation split.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments, only general mentions of experimentation.
Software Dependencies No The paper does not provide specific software dependencies or their version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed to replicate the experiment.
Experiment Setup No The details of hyperparameters of these methods can be found in Appendix D. While hyperparameters are mentioned, their specific values are not provided in the main text.