A Clustering-based framework for Classifying Data Streams

Authors: Xuyang Yan, Abdollah Homaifar, Mrinmoy Sarkar, Abenezer Girma, Edward Tunstel

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, experiments are conducted on the CDSC-AL framework using nine benchmark datasets, and comparison studies with the state-of-the-art methods are presented.
Researcher Affiliation Collaboration 1North Carolina A&T State University, Greensboro, NC, 27401, USA 2Raytheon Technologies Research Center, East Hartford, CT, 06108, USA
Pseudocode Yes Algorithm 1 An overview of CDSC-AL framework, Algorithm 2 New cluster merge procedure, Algorithm 3 Adaptation of drifted and novel concepts, Algorithm 4 Classification through label propagation
Open Source Code Yes The python code of the CDSC-AL framework is available at the link1. 1https://github.com/XuyangAbert/CDSC-AL
Open Datasets Yes Nine multi-class benchmark datasets, including three synthetic datasets and six wellknown real datasets from [Dheeru and Karra Taniskidou, 2017], are used in the experiments for performance evaluation. Table 1 summarizes these datasets in terms of sample size, dimensionality, number of classes, and class overlap.
Dataset Splits No The paper describes partitioning data into chunks and using a small portion of labels for semi-supervised learning, but it does not provide explicit training/validation/test dataset splits with percentages, sample counts, or cross-validation details for static dataset partitioning.
Hardware Specification Yes All experiments are conducted on an Intel Xeon (R) machine with 64GB RAM operating on Microsoft Windows 10.
Software Dependencies No The paper mentions MATLAB, MOA framework, and Python code but does not provide specific version numbers for these software components or any associated libraries.
Experiment Setup Yes For the semi-supervised methods and CDSC-AL, the portion of labeled data of each incoming data chunk is set as 10%. For supervised methods, the labels of all samples from an incoming data chunk are provided to update the classifier after classification while CDSC-AL utilized only 10% labeled data.