Cogra: Concept-Drift-Aware Stochastic Gradient Descent for Time-Series Forecasting

Authors: Kohei Miyaguchi, Hiroshi Kajino4594-4601

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental As a result of comprehensive experiments, we find that (i) our SMT can estimate the mean better than v SGD s estimator in the presence of concept drift, and (ii) in terms of predictive performance, Cogra reduces the predictive loss by 16 67% for real-world datasets, indicating that SMT improves the prediction accuracy significantly.The effectiveness of our method is empirically validated by extensive simulations. In specific, we design two experiments to answer the following questions: (Q1) Does SMT estimate the moments better than the estimator used in v SGD? (Q2) Does SMT improve the predictive performance? (Q3) When does Cogra outperform the other SGDs?The first experiment, answering (Q1), evaluates the estimation error of a moment on synthetic data. The result shows that SMT decreases the error by 60% in total, measured by squared loss, as compared to v SGD, answering (Q1) in the affirmative. The second one, answering (Q2) and (Q3), evaluates the predictive performances on both synthetic and real-world data.
Researcher Affiliation Collaboration Kohei Miyaguchi The University of Tokyo, Tokyo, Japan kohei miyaguchi@mist.i.u-tokyo.ac.jp Hiroshi Kajino IBM Research Tokyo, Tokyo, Japan kajino@jp.ibm.com
Pseudocode Yes Algorithm 1 Sequential Mean Tracker, SMT Algorithm 2 Cogra algorithm
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes We employ VAR(3) as a predictive model and use three datasets from the UCI repository (Lichman 2013). The activity recognition dataset records three dimensional acceleration data (Casale, Pujol, and Radeva 2012). The gas sensor array dataset (Fonollosa et al. 2015) collects the recordings of 18 chemical sensors exposed to the gas mixture with dynamically-varying concentrations.
Dataset Splits No The paper describes using synthetic and real-world datasets and an online learning procedure where the model predicts the next data point, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined split references).
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models, memory, or cloud computing instance types.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies, libraries, or programming languages used in the experiments.
Experiment Setup Yes For Ada Grad, ADAM, and RMSProp, we employ multiple initial learning rates, fixing the other hyperparameters to be as recommended as in the original papers. The initial learning rates are searched over {10 x}3 x=0. For RMSProp, we add 10 4 in the real-world experiments so as to show that the best rate resides inside the search space, not on its boundary. Almeida requires careful tuning of the initial learning rate and the hyper learning rate. We fix the hyper-learning rate as 10 3, 10 2, and 10 1, and we search the initial learning rate so that the model parameters do not diverge. Predictive Model. We employ vector autoregression, VAR(p) (L utkepohl 2005).