reproducibilityindex.ai

Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection

Authors: Chentao Cao, Zhun Zhong, Zhanke Zhou, Yang Liu, Tongliang Liu, Bo Han

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, EOE achieves state-of-the-art performance across different OOD tasks and can be effectively scaled to the Image Net-1K dataset. We employ two widely-used metrics for evaluation: (1) FPR95, the false positive rate of OOD data when the true positive rate is at 95% for ID data, where a lower value indicates better performance; (2) AUROC, the area under the receiver operating characteristic curve, with a higher value signifying superior performance.
Researcher Affiliation	Academia	1TMLR Group, Department of Computer Science, Hong Kong Baptist University 2School of Computer Science and Information Engineering, Hefei University of Technology 3School of Computer Science, University of Nottingham 4Computer Science and Engineering, University of California, Santa Cruz 5Sydney AI Centre, The University of Sydney.
Pseudocode	Yes	Algorithm 1 Zero-shot OOD detection with envisioned outlier class labels
Open Source Code	Yes	The code is publicly available at: https:// github.com/tmlr-group/EOE.
Open Datasets	Yes	The ID datasets for far OOD detection encompass CUB-200-2011 (Wah et al., 2011), STANFORDCARS (Krause et al., 2013), Food-101 (Bossard et al., 2014), Oxford-IIIT Pet (Parkhi et al., 2012) and Image Net1K (Deng et al., 2009). As for the OOD datasets, we use the large-scale OOD datasets i Naturalist (Van Horn et al., 2018), SUN (Xiao et al., 2010), Places (Zhou et al., 2017), and Texture (Cimpoi et al., 2014) curated by MOS (Huang & Li, 2021).
Dataset Splits	Yes	Fine-grained OOD Detection. We split CUB-2002011, STANFORD-CARS, Food-101, and Oxford-IIIT Pet. Specifically, half of the classes from each dataset are randomly selected as ID data, while the remaining classes constitute OOD data. Importantly, there is no overlap between the above ID dataset and the corresponding OOD dataset.
Hardware Specification	Yes	All experiments are performed using the Py Torch 1.13 library (Paszke et al., 2019) and Python 3.10.8, running on an NVIDIA A100 80GB PCIe GPU and AMD EPYC 7H12 CPU.
Software Dependencies	Yes	All experiments are performed using the Py Torch 1.13 library (Paszke et al., 2019) and Python 3.10.8, running on an NVIDIA A100 80GB PCIe GPU and AMD EPYC 7H12 CPU.
Experiment Setup	Yes	Unless otherwise specified, we adopt Vi T-B/16 as the image encoder and masked self-attention Transformer (Vaswani et al., 2017) as the text encoder in our experiments. The pre-trained weights of CLIP are sourced from the official weights provided by Open AI. In addition, we adopt the GPT-3.5-turbo-16k model as the LLM for our research, with the temperature parameter setting to 0. where β is a hyperparameter, and we set β = 0.25 in the main results. For each dataset, we guide the LLM to envision 500 outlier classes, i.e., L = 500.