Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
DOS: Diverse Outlier Sampling for Out-of-Distribution Detection
Authors: Wenyu Jiang, Hao Cheng, MingCai Chen, Chongjun Wang, Hongxin Wei
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the superiority of DOS, reducing the average FPR95 by up to 25.79% on CIFAR-100 with TI-300K. |
| Researcher Affiliation | Academia | 1Department of Statistics and Data Science, Southern University of Science and Technology 2State Key Laboratory for Novel Software Technology, Nanjing University |
| Pseudocode | Yes | Algorithm 1: DOS: Diverse Outlier Sampling |
| Open Source Code | Yes | Code is available at: https://github.com/lygjwy/DOS. |
| Open Datasets | Yes | Datasets. We conduct experiments on CIFAR100 (Krizhevsky & Hinton, 2009) as common benchmark and Image Net-10 (Ming et al., 2022a) as large-scale benchmark. For CIFAR100, a down-sampled version of Image Net (Image Net-RC) (Chen et al., 2021) is utilized as an auxiliary OOD training dataset. Additionally, we use the 300K random Tiny Images subset (TI-300K)1 as an alternative OOD training dataset, due to the unavailability of the original 80 Million Tiny Images2 in previous work (Hendrycks et al., 2019b). 1https://github.com/hendrycks/outlier-exposure |
| Dataset Splits | No | The paper mentions training and test datasets but does not explicitly provide details about a separate validation dataset split (e.g., specific percentages or sample counts) used for hyperparameter tuning or early stopping. While FPR95 at 95% TPR on ID data implies a threshold selection, it does not constitute a distinct validation *split*. |
| Hardware Specification | Yes | All the experiments are conducted on NVIDIA V100 and all methods are implemented with default parameters using Py Torch. |
| Software Dependencies | No | The paper states that methods are 'implemented with default parameters using Py Torch,' but it does not provide a specific version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | The model is trained for 100 epochs using SGD with a momentum of 0.9, a weight decay of 0.0001, and a batch size of 64, for both ID and OOD training data. The initial learning rate is set as 0.1 and decays by a factor of 10 at 75 and 90 epochs. Without tuning, we keep the number of the clustering center the same as the batch size. The model is fine-tuned for 10 epochs using SGD with a momentum of 0.9, a learning rate of 0.001, and a weight decay of 0.00001. |