Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Clustering-based framework for Classifying Data Streams

Authors: Xuyang Yan, Abdollah Homaifar, Mrinmoy Sarkar, Abenezer Girma, Edward Tunstel

IJCAI 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, experiments are conducted on the CDSC-AL framework using nine benchmark datasets, and comparison studies with the state-of-the-art methods are presented.
Researcher Affiliation	Collaboration	1North Carolina A&T State University, Greensboro, NC, 27401, USA 2Raytheon Technologies Research Center, East Hartford, CT, 06108, USA
Pseudocode	Yes	Algorithm 1 An overview of CDSC-AL framework, Algorithm 2 New cluster merge procedure, Algorithm 3 Adaptation of drifted and novel concepts, Algorithm 4 Classiﬁcation through label propagation
Open Source Code	Yes	The python code of the CDSC-AL framework is available at the link1. 1https://github.com/XuyangAbert/CDSC-AL
Open Datasets	Yes	Nine multi-class benchmark datasets, including three synthetic datasets and six wellknown real datasets from [Dheeru and Karra Taniskidou, 2017], are used in the experiments for performance evaluation. Table 1 summarizes these datasets in terms of sample size, dimensionality, number of classes, and class overlap.
Dataset Splits	No	The paper describes partitioning data into chunks and using a small portion of labels for semi-supervised learning, but it does not provide explicit training/validation/test dataset splits with percentages, sample counts, or cross-validation details for static dataset partitioning.
Hardware Specification	Yes	All experiments are conducted on an Intel Xeon (R) machine with 64GB RAM operating on Microsoft Windows 10.
Software Dependencies	No	The paper mentions MATLAB, MOA framework, and Python code but does not provide specific version numbers for these software components or any associated libraries.
Experiment Setup	Yes	For the semi-supervised methods and CDSC-AL, the portion of labeled data of each incoming data chunk is set as 10%. For supervised methods, the labels of all samples from an incoming data chunk are provided to update the classiﬁer after classiﬁcation while CDSC-AL utilized only 10% labeled data.