Beyond Individual Input for Deep Anomaly Detection on Tabular Data
Authors: Hugo Thimonier, Fabrice Popineau, Arpad Rimmel, Bich-Liên Doan
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments on 31 benchmark tabular datasets, we demonstrate that our method achieves state-of-the-art performance, outperforming existing methods by 2.4% and 1.2% in terms of F1-score and AUROC, respectively. Our ablation study further proves that modeling both types of dependencies is crucial for anomaly detection on tabular data. |
| Researcher Affiliation | Academia | 1Universit e Paris-Saclay, CNRS, Centrale Sup elec, Laboratoire Interdisciplinaire des Sciences du Num erique, 91190, Gifsur-Yvette, France. |
| Pseudocode | Yes | Algorithm 1 Pseudo Python Code for Mask-KNN |
| Open Source Code | Yes | Each experiment can be replicated using the code made available on Git Hub4. 4https://github.com/hugothimonier/NPT-AD/ |
| Open Datasets | Yes | The benchmark is comprised of two datasets widely used in the anomaly detection literature, namely Arrhythmia and Thyroid, a second group of datasets, the Multidimensional point datasets , obtained from the Outlier Detection Data Sets (ODDS)3 containing 28 datasets. ... Instead, we include three real-world datasets from (Han et al., 2022) that display relatively similar characteristics to KDD in terms of dimensions: fraud, campaign, and backdoor. |
| Dataset Splits | Yes | Per the literature (Zong et al., 2018; Bergman & Hoshen, 2020), we construct the training set with a random subsample of the normal samples representing 50% of the normal samples, we concatenate the 50% remaining with the entire set of anomalies to constitute the validation set. |
| Hardware Specification | Yes | Our model was trained for each dataset on 4 or 8 Nvidia GPUs V100 16Go/32Go, depending on the dataset dimension. |
| Software Dependencies | No | The paper mentions optimizers like LAMB and Lookahead, but does not provide specific version numbers for programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or other relevant libraries. |
| Experiment Setup | Yes | For each dataset, we considered the same NPT architecture composed of 4 layers alternating between Attention Between Datapoints and Attention Between Attributes and 4 attention heads. Per (Kossen et al., 2021), we consider a Rowwise feed-forward (r FF) network with one hidden layer, 4x expansion factor, Ge LU activation, and also include dropout with p = 0.1 for both attention weights and hidden layers. We used LAMB (You et al., 2020) with β = (0.9, 0.999) as the optimizer and also included a Lookahead (Zhang et al., 2019) wrapper with slow update rate α = 0.5 and k = 6 steps between updates. Similarly, following (Kossen et al., 2021), we consider a flat-then-anneal learning rate schedule: flat at the base learning rate for 70% of steps and then anneals following a cosine schedule to 0 by the end of the training phase, and set gradient clipping at 1. We chose r in accordance with the masking probability pmask used during training and the total number of features d. |