Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Negative Label Guided OOD Detection with Pretrained Vision-Language Models

Authors: Xue Jiang, Feng Liu, Zhen Fang, Hong Chen, Tongliang Liu, Feng Zheng, Bo Han

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our method Neg Label achieves state-of-the-art performance on various OOD detection benchmarks and generalizes well on multiple VLM architectures.
Researcher Affiliation	Academia	Southern University of Science and Technology TMLR Group, Hong Kong Baptist University University of Melbourne University of Technology Sydney Huazhong Agricultural University Mohamed bin Zayed University of Artificial Intelligence Sydney AI Centre, University of Sydney
Pseudocode	Yes	Algorithm 1: Neg Mining
Open Source Code	Yes	The codes are available at https://github.com/tmlr-group/Neg Label.
Open Datasets	Yes	We evaluate our method on the Image Net-1k OOD benchmark (Huang et al., 2021) and compare it with various previous methods. The Image Net-1k OOD benchmark is a widely used performance validation method that uses the large-scale visual dataset Image Net-1k as ID data and i Naturalist (Horn et al., 2018), SUN (Xiao et al., 2010), Places (Zhou et al., 2018), and Textures (Cimpoi et al., 2014) as OOD data, covering a diverse range of scenes and semantics.
Dataset Splits	Yes	We evaluate our method on the Image Net-1k OOD benchmark (Huang et al., 2021) and compare it with various previous methods. The Image Net-1k OOD benchmark is a widely used performance validation method that uses the large-scale visual dataset Image Net-1k as ID data and i Naturalist (Horn et al., 2018), SUN (Xiao et al., 2010), Places (Zhou et al., 2018), and Textures (Cimpoi et al., 2014) as OOD data, covering a diverse range of scenes and semantics. We also follow the settings of MCM (Ming et al., 2022a) and use Stanford-Cars (Krause et al., 2013), CUB-200 (Wah et al., 2011), Oxford-Pet (Parkhi et al., 2012), Food-101 (Bossard et al., 2014), and some subsets of Image Net-1k (Deng et al., 2009) as ID data, and i Naturalist, SUN, Places , and Textures as OOD data to conduct validation experiments on our method.
Hardware Specification	Yes	All the experiments on a single NVIDIA Ge Force RTX 3090Ti GPU.
Software Dependencies	Yes	The proposed method Neg Label is implemented with Python 3.9 and Py Torch 1.9.
Experiment Setup	Yes	The Neg Mining algorithm takes Word Net as the corpus and selects M = 10000 negative labels under η = 0.05. We use the Neg Label score in the sum-softmax form and set τ = 0.01 as the temperature, and use ng = 100 for grouping strategy by default.