A Clustering-based framework for Classifying Data Streams
Authors: Xuyang Yan, Abdollah Homaifar, Mrinmoy Sarkar, Abenezer Girma, Edward Tunstel
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, experiments are conducted on the CDSC-AL framework using nine benchmark datasets, and comparison studies with the state-of-the-art methods are presented. |
| Researcher Affiliation | Collaboration | 1North Carolina A&T State University, Greensboro, NC, 27401, USA 2Raytheon Technologies Research Center, East Hartford, CT, 06108, USA |
| Pseudocode | Yes | Algorithm 1 An overview of CDSC-AL framework, Algorithm 2 New cluster merge procedure, Algorithm 3 Adaptation of drifted and novel concepts, Algorithm 4 Classification through label propagation |
| Open Source Code | Yes | The python code of the CDSC-AL framework is available at the link1. 1https://github.com/XuyangAbert/CDSC-AL |
| Open Datasets | Yes | Nine multi-class benchmark datasets, including three synthetic datasets and six wellknown real datasets from [Dheeru and Karra Taniskidou, 2017], are used in the experiments for performance evaluation. Table 1 summarizes these datasets in terms of sample size, dimensionality, number of classes, and class overlap. |
| Dataset Splits | No | The paper describes partitioning data into chunks and using a small portion of labels for semi-supervised learning, but it does not provide explicit training/validation/test dataset splits with percentages, sample counts, or cross-validation details for static dataset partitioning. |
| Hardware Specification | Yes | All experiments are conducted on an Intel Xeon (R) machine with 64GB RAM operating on Microsoft Windows 10. |
| Software Dependencies | No | The paper mentions MATLAB, MOA framework, and Python code but does not provide specific version numbers for these software components or any associated libraries. |
| Experiment Setup | Yes | For the semi-supervised methods and CDSC-AL, the portion of labeled data of each incoming data chunk is set as 10%. For supervised methods, the labels of all samples from an incoming data chunk are provided to update the classifier after classification while CDSC-AL utilized only 10% labeled data. |