Stochastic Online Anomaly Analysis for Streaming Time Series

Authors: Zhao Xu, Kristian Kersting, Lorenzo von Ritter

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical analysis on real-world datasets demonstrates the effectiveness of our method. We verify the proposed OLAD method on both real-world and synthetic data. We first evaluate the performance of the OLAD in an online anomaly detection scenario. Experiments on network traffic data: We use the Yahoo dataset of real network traffic to some of the Yahoo services (https://webscope.sandbox.yahoo.com/catalog.php? datatype=s&did=70). Experiments on financial data: We also validate the OLAD method with the S&P 500 index data (https://fred. stlouisfed.org/series/SP500) from January 2012 to January 2017. Experiments on synthetic data: We further conduct some supplementary experiments to evaluate the predictive performance of the OLAD method in learning the underlying dynamics of the contaminated time series with the simulated data.
Researcher Affiliation Collaboration 1NEC Labs Europe, Germany 2Technical University of Darmstadt, Germany 3Technical University of Munich, Germany
Pseudocode Yes Algorithm 1: Online one-step ahead prediction for streaming time series
Open Source Code No The paper mentions the Twitter Anomaly Detection repository (https://github.com/twitter/Anomaly Detection) for a baseline method, but does not provide a link or explicit statement for its own methodology's source code.
Open Datasets Yes Experiments on network traffic data: We use the Yahoo dataset of real network traffic to some of the Yahoo services (https://webscope.sandbox.yahoo.com/catalog.php? datatype=s&did=70). Experiments on financial data: We also validate the OLAD method with the S&P 500 index data (https://fred. stlouisfed.org/series/SP500) from January 2012 to January 2017.
Dataset Splits No The paper does not specify explicit training, validation, and test splits with percentages or counts. It mentions using the first T=100 time steps as initialization for the Yahoo dataset, but this is not a general train/validation/test split for the entire dataset.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions using specific methods and baselines like 'HESD [Vallis et al., 2014]' and 'GPEVT [Smith et al., 2012]' and 'GP [Rasmussen and Williams, 2006]', but does not specify any software libraries or dependencies with version numbers used for implementation.
Experiment Setup Yes For each time series, the observations collected at the first T = 100 time steps are viewed as initialization. At each time step t after the initial period, we make a one-step ahead prediction for the next step t + 1 using the OLAD method. If the real observation yt+1 fells far outside the 99.99% predictive interval, then the observation at time t + 1 is identified as an anomaly event. We set the parameters of the kernel function as: ρ = 1.0 and ℓ= exp(2.0). The length of the time series was n = 100. We assume 30 time steps observed, and predict the remaining part of the time series. For the observed time steps, we randomly add m = 0, 1, . . . , 5 outliers.