DriftSurf: Stable-State / Reactive-State Learning under Concept Drift

Authors: Ashraf Tahmasbi, Ellango Jothimurugesan, Srikanta Tirthapura, Phillip B Gibbons

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present a theoretical analysis of Drift Surf, showing that it is risk-competitive with Aware, an adaptive algorithm that has oracle access to when a drift occurs and at each time step maintains a model trained over the set of all data since the previous drift. We also provide experimental comparisons of Drift Surf to Aware and two adaptive learning algorithms: a state-of-the-art drift-detection-based method MDDM and a state-of-the-art ensemble method AUE. Our results on 10 synthetic and real-world datasets with concept drifts confrm our theoretical analysis.
Researcher Affiliation Collaboration 1Department of Electrical and Computer Engineering, Iowa State University, Ames, Iowa, USA. 2Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA. 3Apple Inc, Cupertino, California, USA.
Pseudocode Yes Our algorithm, Drift Surf, is depicted in Algorithm 1, which is executed when Drift Surf is in the stable state, and Algorithm 2, which is executed when Drift Surf is in the reactive state.
Open Source Code No The paper does not provide an explicit statement about the release of its source code or a link to a code repository for the methodology described.
Open Datasets Yes We use fve synthetic, two semi-synthetic and three real datasets for binary classifcation, chosen to include all such datasets that the authors of MDDM and AUE use in their evaluations. These datasets include both abrupt and gradual drifts. Drifts in semi-synthetic datasets are generated by rotating data points or changing the labels of the real-world datasets that originally do not contain any drift. More detail on the datasets is provided in Appendix C.2. ... Dua, D. and Graff, C. UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ml. ... Harries, M. Splice-2 comparative evaluation: Electricity pricing. Technical report, University of New South Wales, 1999. ... Ikonomovska, E. Airline dataset. URL http://kt.ijs.si/elena_ikonomovska/data.html. (Accessed on 02/06/2020). ... Lewis, D. D., Yang, Y., Rose, T. G., and Li, F. RCV1: A new benchmark collection for text categorization research. JMLR, 5:361 397, 2004.
Dataset Splits No The paper states that it divides datasets into equally-sized batches and uses a test-then-train approach, but it does not provide specific percentages or counts for training, validation, or test splits. It also does not mention cross-validation.
Hardware Specification No The paper does not provide specific details about the hardware used for running its experiments.
Software Dependencies No The paper mentions software components like logistic regression, STRSAGA, SGD, Hoeffding Trees, and Naive Bayes classifiers, but does not provide specific version numbers for any of these or their underlying libraries/frameworks.
Experiment Setup No The paper mentions 'hyperparameter settings' are provided in Appendix C.3 and describes general experimental procedures (batch processing, test-then-train, median over five trials), but does not provide specific hyperparameter values or detailed training configurations within the main text.