Rethinking Reverse Distillation for Multi-Modal Anomaly Detection

Authors: Zhihao Gu, Jiangning Zhang, Liang Liu, Xu Chen, Jinlong Peng, Zhenye Gan, Guannan Jiang, Annan Shu, Yabiao Wang, Lizhuang Ma

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that our MMRD outperforms recent state-of-the-art methods on both anomaly detection and localization on MVTec-3D AD and Eyecandies benchmarks.
Researcher Affiliation Collaboration Zhihao Gu1*, Jiangning Zhang2, Liang Liu2, Xu Chen2, Jinlong Peng2, Zhenye Gan2, Guannan Jiang3, Annan Shu3, Yabiao Wang2, Lizhuang Ma1 1School of Electronic and Electrical Engineering, Shanghai Jiao Tong University 2You Tu Lab, Tencent 3Contemporary Amperex Technology Co. Limited (CATL)
Pseudocode No The overall paradigm is shown in Fig. 3 and the algorithm table summarizing the proposed method is included in the supplementary material.
Open Source Code No Codes will be available upon acceptance.
Open Datasets Yes We conduct experiments on two multi-modal benchmarks, i.e., the MVTec 3D-AD (Bergmann et al. 2022) and the Eyecandies (Bonfiglioli et al. 2022).
Dataset Splits No The paper mentions training data and evaluation metrics, but it does not explicitly provide specific train/validation/test dataset splits (e.g., percentages or sample counts) needed for reproduction.
Hardware Specification No The paper mentions 'GPUH' (GPU hours) and 'FPS' (frames per second) in Table 3, indicating the use of GPUs, but it does not specify any particular GPU model (e.g., NVIDIA A100, RTX 3090) or CPU model used for the experiments.
Software Dependencies No The paper mentions 'Adam' as the optimizer but does not provide specific version numbers for any software dependencies, such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Images are resized into 256 256 and Adam is used as the optimizer with a learning rate of 0.001. The model is trained for 400 epochs of batch size 16. the number of prototypes is set 50. The teacher network is a pre-trained Wide Res Net50 and the student is the same as RD. We adopt the depth and normals as auxiliary modalities for MVTec 3D-AD and Eyecandies datasets, respectively.