AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models

Authors: Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Ming Tang, Jinqiao Wang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimentally, we conduct extensive experiments on the MVTec-AD (Bergmann et al. 2019) and Vis A (Zou et al. 2022) datasets. With unsupervised training on the MVTec AD dataset, we achieve an accuracy of 93.3%, an imagelevel AUC of 97.4%, and a pixel-level AUC of 93.1%. Table 2: Few-shot IAD results on MVTec-AD and Vis A datasets. Table 4: Results of ablation studies.
Researcher Affiliation Collaboration 1Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China 2School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China 3Objecteye Inc., Beijing, China
Pseudocode No The paper describes methods in prose and uses equations, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements about releasing source code or provide links to a code repository.
Open Datasets Yes We conduct experiments primarily on the MVTec-AD (Bergmann et al. 2019) and Vis A (Zou et al. 2022) datasets.
Dataset Splits No The MVTec-AD dataset comprises 3629 training images and 1725 testing images across 15 different categories... The training images only consist of normal images, while the testing images contain both normal and anomalous images. Consistent with previous IAD methods, we only use the normal data from these datasets for training. (No explicit mention of a validation set or split for experiments, only training and testing.)
Hardware Specification Yes Training is conducted on two RTX-3090 GPUs over 50 epochs, with a learning rate of 1e-3 and a batch size of 16.
Software Dependencies No The paper mentions using specific models like 'Vicuna-7B' and 'Panda GPT' and 'Image Bind Huge', but it does not provide specific version numbers for these or other software dependencies, such as programming languages or libraries.
Experiment Setup Yes We set the image resolution at 224 224 and feed the outputs from the 8th, 16th, 24th, and 32nd layers of Image Bind-Huge s image encoder to the image decoder. Training is conducted on two RTX-3090 GPUs over 50 epochs, with a learning rate of 1e-3 and a batch size of 16. Linear warm-up and a one-cycle cosine learning rate decay strategy are applied.