Temporal Anomaly Detection: Calibrating the Surprise

Authors: Eyal Gutflaish, Aryeh Kontorovich, Sivan Sabato, Ofer Biller, Oded Sofer3755-3762

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide a detailed description of the algorithm, including a convergence analysis, and report encouraging empirical results. One of the data sets that we tested is new for the public domain. It consists of two months worth of database access records from a live system. This data set and our code are publicly available at https://github.com/eyalgut/TLR anomaly detection.git.
Researcher Affiliation Collaboration 1Ben-Gurion University of the Negev, Beer Sheva, Israel 2IBM Security Division, Israel
Pseudocode Yes Algorithm 1 Find Model(λ, S): Find model matrix ... Algorithm 2 Folded LL(Bt, ˆπ, G, H, U, V )
Open Source Code Yes This data set and our code are publicly available at https://github.com/eyalgut/TLR anomaly detection.git.
Open Datasets Yes One of the data sets that we tested is new for the public domain. It consists of two months worth of database access records from a live system. This data set and our code are publicly available at https://github.com/eyalgut/TLR anomaly detection.git. ... The second data set is from Amazon (Lichman 2013). ... We further tested on the movie-rating data sets Movie Lens (Harper and Konstan 2016) and Netflix (Bennett, Lanning, and others 2007).
Dataset Splits Yes We split S into two parts, S1 = (B1, . . . , BT1), S2 = (BT1+1, . . . , BT ). S1 is used to find an estimator ˆπ for the probabilistic stationary model π, while S2 is used to fit the log-likelihood regressor ˆw. ... k-fold cross-validation (k = 10) is performed to select λ Λ: In fold i, S1 is divided to a training part St 1(i) and a validation part Sv 1(i)
Hardware Specification Yes Table 2: Run-time (seconds) on an 2.8GHz Xeon CPU with 40 cores and 256 GB RAM.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies.
Experiment Setup Yes The available data sets do not contain known anomalous accesses. Thus, in our experiments we injected anomalous behavior into random intervals, as explained below. ... For our algorithm, we used the following natural timedependent features for regression: A binary weekend feature, the log-likelihood of the previous interval and of the one 24 hours ago (for TDA) or a week ago (for the others), the number of accesses in the current interval, the number of intervals since the last training set interval, day-of-the-week, and for TDA also hour of the day h {1, . . . , 24} and shifted hour of the day ((h + 12) mod 24).