CLIP-FSAC: Boosting CLIP for Few-Shot Anomaly Classification with Synthetic Anomalies

Authors: Zuo Zuo, Yao Wu, Baoqiang Li, Jiahao Dong, You Zhou, Lei Zhou, Yanyun Qu, Zongze Wu

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiment results are provided for evaluating our method in few-normal-shot anomaly classification, which outperforms the state-of-the-art method by 12.2%, 10.9%, 10.4% AUROC on Vis A for 1, 2, and 4-shot settings.
Researcher Affiliation Academia Zuo Zuo1,2 , Yao Wu 3 , Baoqiang Li2 , Jiahao Dong2 , You Zhou4 , Lei Zhou2 , Yanyun Qu3 and Zongze Wu1,2,4 1 National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, and Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University 2Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ) 3Xiamen University 4Shenzhen University {Nostalgiaz}@stu.xjtu.edu.cn, {libaoqiang2023,dongjiahao2023}@email.szu.edu.cn, yyqu@xmu.edu.cn, wuyao@stu.xmu.edu.cn, zzwu@szu.edu.cn
Pseudocode No The paper describes its method in prose and diagrams (Figure 2), but does not include structured pseudocode or an algorithm block.
Open Source Code No The paper does not provide an explicit statement about releasing its source code, nor does it include a link to a code repository.
Open Datasets Yes Our experiments are conducted on both MVTEC-AD [Bergmann et al., 2019] and Vis A [Zou et al., 2022] datasets.
Dataset Splits No The paper mentions a 'training set' and 'test set' but does not explicitly describe a separate 'validation' split or its size/proportion.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU or CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions 'Open CLIP' and 'Adam optimizer' but does not provide specific version numbers for these or any other software dependencies required for replication.
Experiment Setup Yes We utilize Open CLIP and load their pre-trained checkpoints called LAION-400M [Schuhmann et al., 2021] into image and text encoders like Win CLIP. Meanwhile, we directly use CPE proposed in Win CLIP. In our method, we use random perturbation to generate anomalies on Mvtec AD and NSA on Vis A. Our image adapter consists of 2-layer multi-layer perceptron (MLP) and text adapter is composed of one MLP. We use Adam optimizer with learning rate of 0.0005 for image adapter and 0.0001 for text adapter. Training epoch is set to 100 for all datasets. Our few-shot anomaly classification settings are set to 1-shot, 2-shot and 4-shot. So batch size is set to the number of training samples. α1 and α2 in Equation 3 are both set to 1 for MVTEC-AD and Vis A. β1 and β2 in Equation 4 are set 0.1 and 0.9 respectively for MVTEC-AD dataset but both are set 1 for Vis A.