Towards a Hierarchical Bayesian Model of Multi-View Anomaly Detection

Authors: Zhen Wang, Chao Lan

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the experiment, we show the proposed Bayesian detector consistently outperforms state-of-the-art counterparts across several public data sets and three well-known types of multi-view anomalies. In the experiment, we show the proposed model consistently outperforms state-of-the-art multi-view anomaly detectors across both synthetic and real-world multi-view data.
Researcher Affiliation Academia Zhen Wang and Chao Lan Department of Computer Science, University of Wyoming, WY, USA {zwang10, clan}@uwyo.edu
Pseudocode Yes Algorithm 1 Compute Optimal Threshold Input: Data {X, X }, Swapping Rate γ, Detection Rate ζ Output: Detection Threshold ˆτζ 1: Generate mixture set X γ via swapping views randomly. 2: Compute anomaly scores for all points in X and X γ via Eq. (28), and denote them as S and Sγ respectively. 3: Calculate empirical CDF ˆFa. 4: Optimize threshold by ˆτζ = max s(x) {S, Sγ}| ˆFa(s(x)) 1 ζ
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the methodology described is publicly available or released.
Open Datasets Yes We now show the effectiveness of proposed method on public Outlier Detection Datasets (ODDS)2, Web KB dataset3 and Movie Lens dataset4. 2http://odds.cs.stonybrook.edu 3http://lig-membres.imag.fr/grimal/data.html 4https://grouplens.org/datasets/movielens/latest
Dataset Splits No The paper states: 'After the outlier generation stage, we equivalently split all normal instances into two parts, and use one of them as the training set to train the proposed model. Then we verify the outlier detection performance on the test set'. While a train/test split is described, there is no mention of a separate validation set or specific details for reproducibility of the split beyond 'equivalently split'.
Hardware Specification No The paper describes the proposed model and experimental evaluations, but it does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used to run the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks, or specific solvers).
Experiment Setup Yes In particular, we assign the automatic relevance determination (ARD) prior [Neal, 2012] on the projection matrices to sparsify their columns for automatically determining the dimension of latent factor; we also place Student s t distributions on the latent factor prior and the likelihood to improve robustness of the estimator [Archambeau et al., 2006; Gai et al., 2008]. Since we have no further knowledge about the hyperparameters of priors, we choose broad ones by setting aα = bα = βv = 10-3, Kv = 10-3Idv, νv = dv + 1, aν = 2 and bν = 0.1, m = min{dv 1; v = 1, . . . , V }. On each dataset, we repeat the random outlier generation procedure 20 times and at each time, we perturb 2.5% of the data in that procedure.