MCM: Masked Cell Modeling for Anomaly Detection in Tabular Data
Authors: Jiaxin Yin, Yuanyuan Qiao, Zitang Zhou, Xiangchao Wang, Jie Yang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show our method achieves state-of-the-art performance. We also discuss the interpretability from the perspective of each individual feature and correlations between features. Code is released at https://github.com/JXYin24/MCM. and 4 EXPERIMENTS, 4.1 MAIN RESULTS, 4.2 ABLATION STUDIES |
| Researcher Affiliation | Academia | Beijing University of Posts and Telecommunications, Beijing, China, Hangzhou Dianzi University |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | Code is released at https://github.com/JXYin24/MCM. |
| Open Datasets | Yes | In our study, we select 20 commonly used tabular datasets in AD, which span diverse domains, including healthcare, finance, social sciences, etc. The datasets comprise 12 sourced from ODDS (Rayana, 2016) and 8 from ADBench (Han et al., 2022). |
| Dataset Splits | No | We randomly partition the normal data of each dataset into two equal halves. The training dataset consists of one-half of the normal data, while the testing dataset comprises the other half of the normal data combined with all the abnormal instances. (No explicit mention of a separate validation split.) |
| Hardware Specification | Yes | The experiments were conducted on a single Tesla V100 GPU. |
| Software Dependencies | Yes | Our code is implemented based on Py Torch 1.10.2 framework with Python 3.6. Other critical package requirements include torchvision 0.11.3, numpy 1.23.5, pandas 1.5.3, and scipy 1.10.1. |
| Experiment Setup | Yes | In detail, we set epochs to 200, the batch size to 512, and fix the the number of masking matrices at 15. The temperature τ in diversity loss is set to 0.1. The number of hidden states of most layers in AE is 256, except for the bottleneck layer with the number of 128. A fixed set of hyperparameters is effective across all datasets, with dataset dimensions ranging widely from 6 to 500. The only two parameters tuned for different datasets are the learning rate and the weight λ which balances two parts of loss. Adam optimizer is employed, bounded by an exponentially decaying learning rate controller. |