MCM: Masked Cell Modeling for Anomaly Detection in Tabular Data

Authors: Jiaxin Yin, Yuanyuan Qiao, Zitang Zhou, Xiangchao Wang, Jie Yang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show our method achieves state-of-the-art performance. We also discuss the interpretability from the perspective of each individual feature and correlations between features. Code is released at https://github.com/JXYin24/MCM. and 4 EXPERIMENTS, 4.1 MAIN RESULTS, 4.2 ABLATION STUDIES
Researcher Affiliation Academia Beijing University of Posts and Telecommunications, Beijing, China, Hangzhou Dianzi University
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes Code is released at https://github.com/JXYin24/MCM.
Open Datasets Yes In our study, we select 20 commonly used tabular datasets in AD, which span diverse domains, including healthcare, finance, social sciences, etc. The datasets comprise 12 sourced from ODDS (Rayana, 2016) and 8 from ADBench (Han et al., 2022).
Dataset Splits No We randomly partition the normal data of each dataset into two equal halves. The training dataset consists of one-half of the normal data, while the testing dataset comprises the other half of the normal data combined with all the abnormal instances. (No explicit mention of a separate validation split.)
Hardware Specification Yes The experiments were conducted on a single Tesla V100 GPU.
Software Dependencies Yes Our code is implemented based on Py Torch 1.10.2 framework with Python 3.6. Other critical package requirements include torchvision 0.11.3, numpy 1.23.5, pandas 1.5.3, and scipy 1.10.1.
Experiment Setup Yes In detail, we set epochs to 200, the batch size to 512, and fix the the number of masking matrices at 15. The temperature τ in diversity loss is set to 0.1. The number of hidden states of most layers in AE is 256, except for the bottleneck layer with the number of 128. A fixed set of hyperparameters is effective across all datasets, with dataset dimensions ranging widely from 6 to 500. The only two parameters tuned for different datasets are the learning rate and the weight λ which balances two parts of loss. Adam optimizer is employed, bounded by an exponentially decaying learning rate controller.