Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Authors: Atsuyuki Miyai, Jingkang Yang, Jingyang Zhang, Yifei Ming, Yueqian Lin, Qing Yu, Go Irie, Shafiq Joty, Yixuan Li, Hai Helen Li, Ziwei Liu, Toshihiko Yamasaki, Kiyoharu Aizawa
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we compare several representative VLM-based OOD detection methods. ... Table 3: Comparison of OOD detection methods across Image Net, Image Net-20, and Image Net-X. We use AUROC for the evaluation of OOD detection. |
| Researcher Affiliation | Collaboration | Atsuyuki Miyai EMAIL The University of Tokyo Jingkang Yang EMAIL S-Lab, Nanyang Technological University Jingyang Zhang EMAIL Duke University Yifei Ming EMAIL Salesforce AI Research |
| Pseudocode | No | The paper describes methodologies in natural language and does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The resource is available at https://github.com/Atsu Miyai/Awesome-OOD-VLM. |
| Open Datasets | Yes | In this section, we report results on the widely used Image Net OOD benchmark (Huang & Li, 2021) and two Image Net-based hard OOD benchmark. ... MVTec-AD dataset (Bergmann et al., 2019) and Vis A dataset (Zou et al., 2022) are commonly used. |
| Dataset Splits | Yes | In the Image Net OOD benchmark, Image Net is used as the ID dataset, while datasets such as i Naturalist (Van Horn et al., 2018) serve as OOD datasets. ... Image Net-20 is used as the ID dataset, and Image Net-10, which has no overlapping categories, is used as the OOD dataset. ... Both ID and OOD sets consist of 500 classes each. The ID and OOD subsets of Image Net-X contain 25,000 images respectively. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions using pre-trained models and frameworks like CLIP, GPT-4V, LLaVA, Grounding DINO, and SAM, but does not provide specific version numbers for these or any underlying software dependencies (e.g., Python, PyTorch versions). |
| Experiment Setup | Yes | For Co Op and Lo Co Op, we follow the hyperparameter settings from previous studies and train with 16 shots. On the other hand, since IDPrompt requires higher training costs, we follow the original implementation (Bai et al., 2024a) and conduct training with only 1 shot. |