Fast Incremental SVDD Learning Algorithm with the Gaussian Kernel
Authors: Hansi Jiang, Haoyu Wang, Wenhao Hu, Deovrat Kakde, Arin Chaudhuri3991-3998
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on some real data sets indicate that FISVDD demonstrates significant gains in efficiency with almost no loss in either outlier detection accuracy or objective function value.We examined the performance of FISVDD with four real data sets: shuttle data (Lichman 2013), mammography data (Woods et al. 1993), forest cover (Forest Type) data (Rayana 2016), and the SMTP subset of KDD Cup 99 data (Rayana 2016). The purpose of our experiments is to show that compared to the incremental SVM method (which can achieve global optimal solutions), the FISVDD method does not lose much in either objective function value or outlier detection accuracy while it demonstrates significant gains in efficiency. |
| Researcher Affiliation | Industry | Hansi Jiang, Haoyu Wang, Wenhao Hu, Deovrat Kakde, Arin Chaudhuri SAS Institute Inc. 100 SAS Campus Drive Cary, North Carolina 27513 {Hansi.Jiang; Haoyu.Wang; Wenhao.Hu; Dev.Kakde; Arin.Chaudhuri}@sas.com |
| Pseudocode | Yes | The FISVDD algorithm is shown in Algorithm 3. It contains three parts of FISVDD: expanding (which is shown in Algorithm 1), shrinking (which is shown in Algorithm 2), and bookkeeping. |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described, nor does it explicitly state that the code is released or available. |
| Open Datasets | Yes | We examined the performance of FISVDD with four real data sets: shuttle data (Lichman 2013), mammography data (Woods et al. 1993), forest cover (Forest Type) data (Rayana 2016), and the SMTP subset of KDD Cup 99 data (Rayana 2016). |
| Dataset Splits | Yes | Our experiments used 4/5 of the normal data, randomly chosen, for training. The remaining normal data and the outliers together form the testing sets.Proper Gaussian bandwidths are selected by using fivefold cross validation. |
| Hardware Specification | No | The paper does not provide specific hardware details (like exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper states "SAS/IML R software is used in performing the experiments," but it does not provide specific version numbers for SAS/IML or any other software dependencies needed to replicate the experiment. |
| Experiment Setup | Yes | Throughout this paper, it is assumed that the Gaussian similarity is used and that a proper Gaussian kernel bandwidth σ has been chosen such that the number of support vectors is much less than the number of observations.Proper Gaussian bandwidths are selected by using fivefold cross validation. |