SeTAR: Out-of-Distribution Detection with Selective Low-Rank Approximation
Authors: Yixia Li, Boya Xiong, Guanhua Chen, Yun Chen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations on Image Net1K and Pascal-VOC benchmarks show Se TAR s superior performance, reducing the relatively false positive rate by up to 18.95% and 36.80% compared to zero-shot and fine-tuning baselines. Our work offers a scalable, efficient solution for OOD detection, setting a new state-of-the-art in this area. |
| Researcher Affiliation | Collaboration | Yixia Li1 , Boya Xiong2 , Guanhua Chen1 , Yun Chen3 1Southern University of Science and Technology 2Shanghai University of Finance and Economics 3Mo E Key Laboratory of Interdisciplinary Research of Computation and Economics, Shanghai University of Finance and Economics |
| Pseudocode | Yes | Algorithm 1 The hyperparameter search in Se TAR. and Listing 1: Example procedure of Se TAR on Image Net1K with CLIP-base. |
| Open Source Code | Yes | Code are available at https://github.com/X1AOX1A/Se TAR. |
| Open Datasets | Yes | Following previous work (Ming et al., 2022; Miyai et al., 2023b), we use two real-world datasets created from Image Net1K (Deng et al., 2009) and Pascal-VOC (Everingham et al., 2009) as the ID datasets. For OOD datasets, we follow Ming et al. (2022) to preprocess i Naturalist, SUN, Places and Texture, and follow Miyai et al. (2023b) to preprocess Image Net22K and COCO data. |
| Dataset Splits | Yes | The ID validation set of Image Net1K is collected by sampling one image for each label from the Image Net1K training set. For Pascal-VOC, We randomly sample 10% images as the ID validation set and leave the rest as the ID test set. |
| Hardware Specification | Yes | All experiments are conducted on a single NVIDIA RTX 4090 GPU. |
| Software Dependencies | No | The paper mentions using specific models like 'CLIP Vi TB/164' and 'Swin Transformer', but it does not list specific version numbers for software dependencies such as Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We use CLIP Vi TB/164 (Radford et al., 2021) as our backbone. Both image and text encoders have 12 layers. The rank reduction ratio candidates range from 0 to 40% in 5% intervals. We use a temperature of 15, unless stated otherwise. The hyperparameters for Se TAR are shown in Table 12. And the hyperparameters for Se TAR+FT and Lo RA are shown in Table 13. |