Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

Authors: Atsuyuki Miyai, Jingkang Yang, Jingyang Zhang, Yifei Ming, Yueqian Lin, Qing Yu, Go Irie, Shafiq Joty, Yixuan Li, Hai Helen Li, Ziwei Liu, Toshihiko Yamasaki, Kiyoharu Aizawa

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we compare several representative VLM-based OOD detection methods. ... Table 3: Comparison of OOD detection methods across Image Net, Image Net-20, and Image Net-X. We use AUROC for the evaluation of OOD detection.
Researcher Affiliation	Collaboration	Atsuyuki Miyai EMAIL The University of Tokyo Jingkang Yang EMAIL S-Lab, Nanyang Technological University Jingyang Zhang EMAIL Duke University Yifei Ming EMAIL Salesforce AI Research
Pseudocode	No	The paper describes methodologies in natural language and does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The resource is available at https://github.com/Atsu Miyai/Awesome-OOD-VLM.
Open Datasets	Yes	In this section, we report results on the widely used Image Net OOD benchmark (Huang & Li, 2021) and two Image Net-based hard OOD benchmark. ... MVTec-AD dataset (Bergmann et al., 2019) and Vis A dataset (Zou et al., 2022) are commonly used.
Dataset Splits	Yes	In the Image Net OOD benchmark, Image Net is used as the ID dataset, while datasets such as i Naturalist (Van Horn et al., 2018) serve as OOD datasets. ... Image Net-20 is used as the ID dataset, and Image Net-10, which has no overlapping categories, is used as the OOD dataset. ... Both ID and OOD sets consist of 500 classes each. The ID and OOD subsets of Image Net-X contain 25,000 images respectively.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions using pre-trained models and frameworks like CLIP, GPT-4V, LLaVA, Grounding DINO, and SAM, but does not provide specific version numbers for these or any underlying software dependencies (e.g., Python, PyTorch versions).
Experiment Setup	Yes	For Co Op and Lo Co Op, we follow the hyperparameter settings from previous studies and train with 16 shots. On the other hand, since IDPrompt requires higher training costs, we follow the original implementation (Bai et al., 2024a) and conduct training with only 1 shot.