Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Category-Extensible Out-of-Distribution Detection via Hierarchical Context Descriptions
Authors: Kai Liu, Zhihang Fu, Chao Chen, Sheng Jin, Ze Chen, Mingyuan Tao, Rongxin Jiang, Jieping Ye
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments to show the proposed hierarchical context descriptions are crucial to precisely and universally define each category. As a result, our method consistently outperforms the competitors on the large-scale OOD datasets, while showing comparable or even better generalization than the remarkable zero-shot methods. In this section, we empirically validate the effectiveness of our CATEX on real-word large-scale classification and OOD detection tasks. |
| Researcher Affiliation | Collaboration | 1Zhejiang University, 2Alibaba Cloud |
| Pseudocode | No | The paper describes its methods in prose and figures, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a direct link to its own source code or explicitly state that its implementation code is publicly available. |
| Open Datasets | Yes | Datasets. Following the common benchmarks in the literature [59, 50, 60, 38], we mainly consider the large-scale Image Net [11] as the in-distribution data. Subsets of i Naturalist [53], SUN [65], Places [69], and Texture [8] are adopted as the OOD datasets. |
| Dataset Splits | Yes | For pre-processing, we follow Ridnik et al [42] to clean invalid classes, allocating 50 images per class for validation, and crop-resizing all the images to 224 resolution. |
| Hardware Specification | Yes | We use Python 3.7.13 and Py Torch 1.8.1, and 2 NVIDIA V100-32G GPUs. |
| Software Dependencies | Yes | We use Python 3.7.13 and Py Torch 1.8.1, and 2 NVIDIA V100-32G GPUs. |
| Experiment Setup | Yes | Following the default setting [71], each context consists of 16 learnable 512-D prompt embeddings, which are trained for 50 epochs using the SGD optimizer with a momentum of 0.9. The initial learning rate is 0.002, which is decayed by the cosine annealing rule. |