reproducibilityindex.ai

Entity Embedding-Based Anomaly Detection for Heterogeneous Categorical Events

Authors: Ting Chen, Lu-An Tang, Yizhou Sun, Zhengzhang Chen, Kai Zhang

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on real enterprise surveillance data show that our methods can accurately detect abnormal events compared to other state-of-the-art abnormal detection techniques.
Researcher Affiliation	Collaboration	Ting Chen,1 Lu-An Tang,2 Yizhou Sun,1 Zhengzhang Chen,2 Kai Zhang2 1Northeastern University, 2NEC Labs America {tingchen, yzsun}@ccs.neu.edu, {ltang, zchen, kzhang}@nec-labs.com
Pseudocode	No	The paper describes the model and learning process using text and mathematical equations, but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not mention providing open-source code for the described methodology or link to any code repository.
Open Datasets	No	The paper states, 'real surveillance data collected in an enterprise system during a two-week period' and 'We do not have the ground-truth labels for collected events'. This indicates the use of proprietary internal data without public access information.
Dataset Splits	Yes	We split the two-week data into two of one-weeks. The events in the ﬁrst week are used as training set3, and new events that only appeared in the second week are used as test sets. (Footnote 3: With randomly selected portion as validation set for selection of hyper-parameters.)
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud computing instances used for the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., specific programming language versions or library versions).
Experiment Setup	Yes	For both APE and APE (no weight), the following setting is used: the embedding is randomly initialized, and dimension is set to 10; for each observed training event, we draw 3 negative samples for each of the entity type, which accounts for a total of 3m negative samples per training instance; we also use a mini-batch of size 128 for speed up stochastic gradient descent, and 5-10 epochs are general enough for convergence.