Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
A Clustering-based framework for Classifying Data Streams
Authors: Xuyang Yan, Abdollah Homaifar, Mrinmoy Sarkar, Abenezer Girma, Edward Tunstel
IJCAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, experiments are conducted on the CDSC-AL framework using nine benchmark datasets, and comparison studies with the state-of-the-art methods are presented. |
| Researcher Affiliation | Collaboration | 1North Carolina A&T State University, Greensboro, NC, 27401, USA 2Raytheon Technologies Research Center, East Hartford, CT, 06108, USA |
| Pseudocode | Yes | Algorithm 1 An overview of CDSC-AL framework, Algorithm 2 New cluster merge procedure, Algorithm 3 Adaptation of drifted and novel concepts, Algorithm 4 Classification through label propagation |
| Open Source Code | Yes | The python code of the CDSC-AL framework is available at the link1. 1https://github.com/XuyangAbert/CDSC-AL |
| Open Datasets | Yes | Nine multi-class benchmark datasets, including three synthetic datasets and six wellknown real datasets from [Dheeru and Karra Taniskidou, 2017], are used in the experiments for performance evaluation. Table 1 summarizes these datasets in terms of sample size, dimensionality, number of classes, and class overlap. |
| Dataset Splits | No | The paper describes partitioning data into chunks and using a small portion of labels for semi-supervised learning, but it does not provide explicit training/validation/test dataset splits with percentages, sample counts, or cross-validation details for static dataset partitioning. |
| Hardware Specification | Yes | All experiments are conducted on an Intel Xeon (R) machine with 64GB RAM operating on Microsoft Windows 10. |
| Software Dependencies | No | The paper mentions MATLAB, MOA framework, and Python code but does not provide specific version numbers for these software components or any associated libraries. |
| Experiment Setup | Yes | For the semi-supervised methods and CDSC-AL, the portion of labeled data of each incoming data chunk is set as 10%. For supervised methods, the labels of all samples from an incoming data chunk are provided to update the classifier after classification while CDSC-AL utilized only 10% labeled data. |