Beyond Individual Input for Deep Anomaly Detection on Tabular Data

Authors: Hugo Thimonier, Fabrice Popineau, Arpad Rimmel, Bich-Liên Doan

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments on 31 benchmark tabular datasets, we demonstrate that our method achieves state-of-the-art performance, outperforming existing methods by 2.4% and 1.2% in terms of F1-score and AUROC, respectively. Our ablation study further proves that modeling both types of dependencies is crucial for anomaly detection on tabular data.
Researcher Affiliation Academia 1Universit e Paris-Saclay, CNRS, Centrale Sup elec, Laboratoire Interdisciplinaire des Sciences du Num erique, 91190, Gifsur-Yvette, France.
Pseudocode Yes Algorithm 1 Pseudo Python Code for Mask-KNN
Open Source Code Yes Each experiment can be replicated using the code made available on Git Hub4. 4https://github.com/hugothimonier/NPT-AD/
Open Datasets Yes The benchmark is comprised of two datasets widely used in the anomaly detection literature, namely Arrhythmia and Thyroid, a second group of datasets, the Multidimensional point datasets , obtained from the Outlier Detection Data Sets (ODDS)3 containing 28 datasets. ... Instead, we include three real-world datasets from (Han et al., 2022) that display relatively similar characteristics to KDD in terms of dimensions: fraud, campaign, and backdoor.
Dataset Splits Yes Per the literature (Zong et al., 2018; Bergman & Hoshen, 2020), we construct the training set with a random subsample of the normal samples representing 50% of the normal samples, we concatenate the 50% remaining with the entire set of anomalies to constitute the validation set.
Hardware Specification Yes Our model was trained for each dataset on 4 or 8 Nvidia GPUs V100 16Go/32Go, depending on the dataset dimension.
Software Dependencies No The paper mentions optimizers like LAMB and Lookahead, but does not provide specific version numbers for programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or other relevant libraries.
Experiment Setup Yes For each dataset, we considered the same NPT architecture composed of 4 layers alternating between Attention Between Datapoints and Attention Between Attributes and 4 attention heads. Per (Kossen et al., 2021), we consider a Rowwise feed-forward (r FF) network with one hidden layer, 4x expansion factor, Ge LU activation, and also include dropout with p = 0.1 for both attention weights and hidden layers. We used LAMB (You et al., 2020) with β = (0.9, 0.999) as the optimizer and also included a Lookahead (Zhang et al., 2019) wrapper with slow update rate α = 0.5 and k = 6 steps between updates. Similarly, following (Kossen et al., 2021), we consider a flat-then-anneal learning rate schedule: flat at the base learning rate for 70% of steps and then anneals following a cosine schedule to 0 by the end of the training phase, and set gradient clipping at 1. We chose r in accordance with the masking probability pmask used during training and the total number of features d.