Multistream Classification with Relative Density Ratio Estimation

Authors: Bo Dong, Yang Gao, Swarup Chandra, Latifur Khan3478-3485

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We theoretically study its properties and empirically demonstrate its superior performance, within a multistream framework called MSCRDR, on benchmark datasets by comparing with other competing methods. Experimental Evaluation Datasets Table 1 lists the real-world and synthetic datasets used to evaluate our approach.
Researcher Affiliation Academia Bo Dong, Yang Gao, Swarup Chandra, Latifur Khan Department of Computer Science University of Texas at Dallas, Richardson TX {bxd130630, yxg122530, swarup.chandra, lkhan}@utdallas.edu
Pseudocode Yes Algorithm 1 Multistream Classification; Algorithm 2 Learn Parameter
Open Source Code No The paper does not provide any explicit statement or link regarding the public availability of its source code.
Open Datasets Yes Forest Cover is a benchmark dataset from the UCI repository1 containing geospatial descriptions of different forest types. Yelp@X dataset (Shrestha, Mukherjee, and Solorio 2018) contains the customer id, reviews and ratings for different Hotel/Restaurant on Yelp website. Amazon@X dataset (Blitzer, Dredze, and Pereira 2007) contains the timestamps, reviews and ratings for different Music/Dvd/Electronics products on Amazon website. Syn RBF@X are synthetic datasets generated using Random RBFGenerator Drift of MOA (Bifet et al. 2010) framework
Dataset Splits No The paper mentions dividing data into source and target streams (1:9 ratio) and an 'initialization set (warm up phase)', but does not provide specific train/validation/test split percentages or counts for model training and evaluation.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments.
Software Dependencies No The paper mentions software like 'word2vec', 'MOA (Massive Online Analysis) framework', and 'weighted SVM', but does not provide specific version numbers for any of them.
Experiment Setup Yes Since the data instances occur continuously along the stream, we use an initial warm-up phase to begin classification, and the classifier model is subsequently adapted to changes in data distribution throughout its lifetime. As illustrated in Figure 1, we use an ensemble of classifiers to perform label prediction. Parameters of all baselines were set based on cross validation on the initialization set (warm up phase) of each dataset. Figure 5 indicates that the accuracy increases with the window size. Figure 5 demonstrates the number of classifiers in ensemble E affect the accuracy and execution time on Syn RBF@002 dataset. When E = 2, the accuracy achieves the highest value.