SeTAR: Out-of-Distribution Detection with Selective Low-Rank Approximation

Authors: Yixia Li, Boya Xiong, Guanhua Chen, Yun Chen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluations on Image Net1K and Pascal-VOC benchmarks show Se TAR s superior performance, reducing the relatively false positive rate by up to 18.95% and 36.80% compared to zero-shot and fine-tuning baselines. Our work offers a scalable, efficient solution for OOD detection, setting a new state-of-the-art in this area.
Researcher Affiliation Collaboration Yixia Li1 , Boya Xiong2 , Guanhua Chen1 , Yun Chen3 1Southern University of Science and Technology 2Shanghai University of Finance and Economics 3Mo E Key Laboratory of Interdisciplinary Research of Computation and Economics, Shanghai University of Finance and Economics
Pseudocode Yes Algorithm 1 The hyperparameter search in Se TAR. and Listing 1: Example procedure of Se TAR on Image Net1K with CLIP-base.
Open Source Code Yes Code are available at https://github.com/X1AOX1A/Se TAR.
Open Datasets Yes Following previous work (Ming et al., 2022; Miyai et al., 2023b), we use two real-world datasets created from Image Net1K (Deng et al., 2009) and Pascal-VOC (Everingham et al., 2009) as the ID datasets. For OOD datasets, we follow Ming et al. (2022) to preprocess i Naturalist, SUN, Places and Texture, and follow Miyai et al. (2023b) to preprocess Image Net22K and COCO data.
Dataset Splits Yes The ID validation set of Image Net1K is collected by sampling one image for each label from the Image Net1K training set. For Pascal-VOC, We randomly sample 10% images as the ID validation set and leave the rest as the ID test set.
Hardware Specification Yes All experiments are conducted on a single NVIDIA RTX 4090 GPU.
Software Dependencies No The paper mentions using specific models like 'CLIP Vi TB/164' and 'Swin Transformer', but it does not list specific version numbers for software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup Yes We use CLIP Vi TB/164 (Radford et al., 2021) as our backbone. Both image and text encoders have 12 layers. The rank reduction ratio candidates range from 0 to 40% in 5% intervals. We use a temperature of 15, unless stated otherwise. The hyperparameters for Se TAR are shown in Table 12. And the hyperparameters for Se TAR+FT and Lo RA are shown in Table 13.