reproducibilityindex.ai

Fast Incremental SVDD Learning Algorithm with the Gaussian Kernel

Authors: Hansi Jiang, Haoyu Wang, Wenhao Hu, Deovrat Kakde, Arin Chaudhuri3991-3998

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on some real data sets indicate that FISVDD demonstrates signiﬁcant gains in efﬁciency with almost no loss in either outlier detection accuracy or objective function value.We examined the performance of FISVDD with four real data sets: shuttle data (Lichman 2013), mammography data (Woods et al. 1993), forest cover (Forest Type) data (Rayana 2016), and the SMTP subset of KDD Cup 99 data (Rayana 2016). The purpose of our experiments is to show that compared to the incremental SVM method (which can achieve global optimal solutions), the FISVDD method does not lose much in either objective function value or outlier detection accuracy while it demonstrates signiﬁcant gains in efﬁciency.
Researcher Affiliation	Industry	Hansi Jiang, Haoyu Wang, Wenhao Hu, Deovrat Kakde, Arin Chaudhuri SAS Institute Inc. 100 SAS Campus Drive Cary, North Carolina 27513 {Hansi.Jiang; Haoyu.Wang; Wenhao.Hu; Dev.Kakde; Arin.Chaudhuri}@sas.com
Pseudocode	Yes	The FISVDD algorithm is shown in Algorithm 3. It contains three parts of FISVDD: expanding (which is shown in Algorithm 1), shrinking (which is shown in Algorithm 2), and bookkeeping.
Open Source Code	No	The paper does not provide any concrete access to source code for the methodology described, nor does it explicitly state that the code is released or available.
Open Datasets	Yes	We examined the performance of FISVDD with four real data sets: shuttle data (Lichman 2013), mammography data (Woods et al. 1993), forest cover (Forest Type) data (Rayana 2016), and the SMTP subset of KDD Cup 99 data (Rayana 2016).
Dataset Splits	Yes	Our experiments used 4/5 of the normal data, randomly chosen, for training. The remaining normal data and the outliers together form the testing sets.Proper Gaussian bandwidths are selected by using ﬁvefold cross validation.
Hardware Specification	No	The paper does not provide specific hardware details (like exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper states "SAS/IML R software is used in performing the experiments," but it does not provide specific version numbers for SAS/IML or any other software dependencies needed to replicate the experiment.
Experiment Setup	Yes	Throughout this paper, it is assumed that the Gaussian similarity is used and that a proper Gaussian kernel bandwidth σ has been chosen such that the number of support vectors is much less than the number of observations.Proper Gaussian bandwidths are selected by using ﬁvefold cross validation.