Entity Embedding-Based Anomaly Detection for Heterogeneous Categorical Events

Authors: Ting Chen, Lu-An Tang, Yizhou Sun, Zhengzhang Chen, Kai Zhang

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on real enterprise surveillance data show that our methods can accurately detect abnormal events compared to other state-of-the-art abnormal detection techniques.
Researcher Affiliation Collaboration Ting Chen,1 Lu-An Tang,2 Yizhou Sun,1 Zhengzhang Chen,2 Kai Zhang2 1Northeastern University, 2NEC Labs America {tingchen, yzsun}@ccs.neu.edu, {ltang, zchen, kzhang}@nec-labs.com
Pseudocode No The paper describes the model and learning process using text and mathematical equations, but does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not mention providing open-source code for the described methodology or link to any code repository.
Open Datasets No The paper states, 'real surveillance data collected in an enterprise system during a two-week period' and 'We do not have the ground-truth labels for collected events'. This indicates the use of proprietary internal data without public access information.
Dataset Splits Yes We split the two-week data into two of one-weeks. The events in the first week are used as training set3, and new events that only appeared in the second week are used as test sets. (Footnote 3: With randomly selected portion as validation set for selection of hyper-parameters.)
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud computing instances used for the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., specific programming language versions or library versions).
Experiment Setup Yes For both APE and APE (no weight), the following setting is used: the embedding is randomly initialized, and dimension is set to 10; for each observed training event, we draw 3 negative samples for each of the entity type, which accounts for a total of 3m negative samples per training instance; we also use a mini-batch of size 128 for speed up stochastic gradient descent, and 5-10 epochs are general enough for convergence.