Entity Embedding-Based Anomaly Detection for Heterogeneous Categorical Events
Authors: Ting Chen, Lu-An Tang, Yizhou Sun, Zhengzhang Chen, Kai Zhang
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on real enterprise surveillance data show that our methods can accurately detect abnormal events compared to other state-of-the-art abnormal detection techniques. |
| Researcher Affiliation | Collaboration | Ting Chen,1 Lu-An Tang,2 Yizhou Sun,1 Zhengzhang Chen,2 Kai Zhang2 1Northeastern University, 2NEC Labs America {tingchen, yzsun}@ccs.neu.edu, {ltang, zchen, kzhang}@nec-labs.com |
| Pseudocode | No | The paper describes the model and learning process using text and mathematical equations, but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not mention providing open-source code for the described methodology or link to any code repository. |
| Open Datasets | No | The paper states, 'real surveillance data collected in an enterprise system during a two-week period' and 'We do not have the ground-truth labels for collected events'. This indicates the use of proprietary internal data without public access information. |
| Dataset Splits | Yes | We split the two-week data into two of one-weeks. The events in the first week are used as training set3, and new events that only appeared in the second week are used as test sets. (Footnote 3: With randomly selected portion as validation set for selection of hyper-parameters.) |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud computing instances used for the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., specific programming language versions or library versions). |
| Experiment Setup | Yes | For both APE and APE (no weight), the following setting is used: the embedding is randomly initialized, and dimension is set to 10; for each observed training event, we draw 3 negative samples for each of the entity type, which accounts for a total of 3m negative samples per training instance; we also use a mini-batch of size 128 for speed up stochastic gradient descent, and 5-10 epochs are general enough for convergence. |