reproducibilityindex.ai

Multistream Classification with Relative Density Ratio Estimation

Authors: Bo Dong, Yang Gao, Swarup Chandra, Latifur Khan3478-3485

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We theoretically study its properties and empirically demonstrate its superior performance, within a multistream framework called MSCRDR, on benchmark datasets by comparing with other competing methods. Experimental Evaluation Datasets Table 1 lists the real-world and synthetic datasets used to evaluate our approach.
Researcher Affiliation	Academia	Bo Dong, Yang Gao, Swarup Chandra, Latifur Khan Department of Computer Science University of Texas at Dallas, Richardson TX {bxd130630, yxg122530, swarup.chandra, lkhan}@utdallas.edu
Pseudocode	Yes	Algorithm 1 Multistream Classiﬁcation; Algorithm 2 Learn Parameter
Open Source Code	No	The paper does not provide any explicit statement or link regarding the public availability of its source code.
Open Datasets	Yes	Forest Cover is a benchmark dataset from the UCI repository1 containing geospatial descriptions of different forest types. Yelp@X dataset (Shrestha, Mukherjee, and Solorio 2018) contains the customer id, reviews and ratings for different Hotel/Restaurant on Yelp website. Amazon@X dataset (Blitzer, Dredze, and Pereira 2007) contains the timestamps, reviews and ratings for different Music/Dvd/Electronics products on Amazon website. Syn RBF@X are synthetic datasets generated using Random RBFGenerator Drift of MOA (Bifet et al. 2010) framework
Dataset Splits	No	The paper mentions dividing data into source and target streams (1:9 ratio) and an 'initialization set (warm up phase)', but does not provide specific train/validation/test split percentages or counts for model training and evaluation.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments.
Software Dependencies	No	The paper mentions software like 'word2vec', 'MOA (Massive Online Analysis) framework', and 'weighted SVM', but does not provide specific version numbers for any of them.
Experiment Setup	Yes	Since the data instances occur continuously along the stream, we use an initial warm-up phase to begin classiﬁcation, and the classiﬁer model is subsequently adapted to changes in data distribution throughout its lifetime. As illustrated in Figure 1, we use an ensemble of classiﬁers to perform label prediction. Parameters of all baselines were set based on cross validation on the initialization set (warm up phase) of each dataset. Figure 5 indicates that the accuracy increases with the window size. Figure 5 demonstrates the number of classiﬁers in ensemble E affect the accuracy and execution time on Syn RBF@002 dataset. When E = 2, the accuracy achieves the highest value.