Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Your Classifier Can Be Secretly a Likelihood-Based OOD Detector

Authors: Jirayu Burapacheep, Yixuan Li

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on the Open OOD benchmark empirically demonstrate that INK establishes a new state-of-the-art in a variety of OOD detection setups, including both far-OOD and near-OOD. We extensively evaluate our method on the latest Open OOD benchmarks (Zhang et al., 2023a), containing CIFAR and Image Net-1k as ID datasets. Our method exhibits significant performance improvements when compared to the state-of-the-art method ASH (Djurisic et al., 2023).
Researcher Affiliation	Academia	Jirayu Burapacheep EMAIL Department of Computer Science Stanford University Yixuan Li EMAIL Department of Computer Sciences University of Wisconsin Madison
Pseudocode	No	The paper describes the methodology using mathematical formulations and theoretical derivations (e.g., Definition 3.1, Theorem 3.1) but does not present any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper references an Open OOD benchmark at https://github.com/Jingkang50/Open OOD/, but this is a third-party benchmark used by the authors, not the source code for their proposed method (INK). There is no explicit statement or link providing the code for their implementation.
Open Datasets	Yes	The benchmark includes CIFAR (Krizhevsky, 2009) and Image Net-1k (Deng et al., 2009) as ID datasets. For the Image Net benchmark, i Naturalist (Van Horn et al., 2018), Textures (Cimpoi et al., 2014), and Open Image-O (Wang et al., 2022) are employed as far-OOD datasets. In addition, we also evaluate on the latest NINCO (Bitterwolf et al., 2023) as the near-OOD dataset.
Dataset Splits	Yes	We adhere to the same data split used in Open OOD (Zhang et al., 2023a), which involves the removal of images with semantic overlap. Specifically, we utilize CIFAR-10 and TIN as the near-OOD datasets, while MNIST (Deng, 2012), SVHN (Netzer et al., 2011), Textures (Cimpoi et al., 2014), and Places (Zhou et al., 2016) serve as our far-OOD datasets.
Hardware Specification	Yes	We conduct our experiments on NVIDIA RTX A6000 GPUs (48GB VRAM).
Software Dependencies	Yes	We use Ubuntu 22.04.2 LTS as the operating system and install the NVIDIA CUDA Toolkit version 11.6 and cu DNN 8.9. All experiments are implemented in Python 3.8 using the Py Torch 1.8.1 framework.
Experiment Setup	Yes	For methods using cross-entropy loss... The initial learning rate is 0.1 and decays by a factor of 10 at epochs 100, 150, and 180. We train the models for 200 epochs on CIFAR. For methods using Sup Con loss... The initial learning rate to 0.5 and follows a cosine annealing schedule. We train the models for 500 epochs. The training-time temperature τ is set to be 0.1. For methods using v MF loss, we train on CIFAR using stochastic gradient descent with momentum 0.9 and weight decay 10^-4. The initial learning rate is 0.5 and follows a cosine annealing schedule. We use a batch size of 512 and train the model for 500 epochs. The training-time temperature τ is set to be 0.1. ...For Image Net-1k, we fine-tune a pre-trained Res Net-50 model in Khosla et al. (2020) for 100 epochs, with an initial learning rate of 0.01 and cosine annealing schedule.