Hierarchical Vector Quantized Transformer for Multi-class Unsupervised Anomaly Detection
Authors: Ruiying Lu, YuJie Wu, Long Tian, Dongsheng Wang, Bo Chen, Xiyang Liu, Ruimin Hu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | By evaluating on MVTec-AD and Vis A datasets, our model surpasses the state-of-the-art alternatives and possesses good interpretability. The code is available at https://github.com/Ruiying Lu/HVQ-Trans. |
| Researcher Affiliation | Academia | Ruiying Lu1 , Yu Jie Wu2 , Long Tian2* , Dongsheng Wang3 Bo Chen3, Xiyang Liu2, Ruimin Hu1 School of Cyber Engineering1, Software Engineering Institute2 National Key Laboratory of Radar Signal Processing3 Xidian University {luruiying,tianlong}@xidian.edu.cn |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/Ruiying Lu/HVQ-Trans. |
| Open Datasets | Yes | MVTec-AD [2] is a wildly-used industrial anomaly detection dataset with 15 classes... Vis A [45] is a recently published large dataset... CIFAR-10 [45] is a classical image classification dataset of 10 categories. |
| Dataset Splits | Yes | For each class, the training samples are normal while the test samples can be either normal or anomalous. In order to implement many-versus-many anomaly detection, we select 5 normal classes while the rest classes are viewed as anomalies. |
| Hardware Specification | Yes | Our model is trained for 1000 epochs on 2 GPUs (NVIDIA Ge Force RTX 3080 10GB) with batch size 16. |
| Software Dependencies | No | The paper mentions software like Efficient Net and Adam W but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | The input image size of MVTec-AD is 224 224 3... The feature maps become 14 14 272, namely, the patch size is 16. Then we reduce the channel dimension of each patch into 256, followed by feeding them into a 4-layer van Trans-enc followed by the corresponding and a 4-layer VQTrans-dec. We use Adam W [53] with weight decay 0.0001 for optimization. Our model is trained for 1000 epochs... with batch size 16. The learning rate is initialized as 1 10 4 and dropped by 0.1 after 800 epochs. |