reproducibilityindex.ai

Image Background Serves as Good Proxy for Out-of-distribution Data

Authors: Sen Pei

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments reveal that SSOD establishes competitive state-of-the-art performance on many large-scale benchmarks, outperforming the best previous method by a large margin, e.g., reporting -6.28% FPR95 and +0.77% AUROC on Image Net, -19.01% FPR95 and +3.04% AUROC on CIFAR-10, and top-ranked performance on hard OOD datasets, i.e., Image Net-O and Open Image-O.
Researcher Affiliation	Industry	Sen Pei Byte Dance Inc. peisen@bytedance.com
Pseudocode	No	The paper describes the formulation of SSOD in text, but it does not include explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement about releasing source code or a link to a code repository.
Open Datasets	Yes	We employ large-scale benchmarks in OOD detection, including Image Net (Russakovsky et al., 2015) groups, CIFAR-10 (Krizhevsky et al., 2009) groups, and hard OOD groups. In Image Net groups, we set four OOD datasets, which are i Naturalist (Horn et al., 2018), SUN (Xiao et al., 2010), Places (Zhou et al., 2018), and Texture (Cimpoi et al., 2014). In CIFAR-10 groups, we set five OOD datasets, which are SVHN (Netzer et al., 2011), LSUN (Yu et al., 2015), i SUN (Xu et al., 2015), Texture (Cimpoi et al., 2014), and Places (Zhou et al., 2018). Under hard OOD setting, the Image Net is employed as ID data while Image Net-O (Huang & Li, 2021) and Open Image-O (Wang et al., 2022) are selected as OOD data.
Dataset Splits	No	The paper describes training and evaluation protocols, including keeping the quantity of ID and OOD data the same for evaluation and storing checkpoints based on FPR95 performance. However, it does not explicitly provide details of a distinct validation dataset split with percentages or sample counts, only implicitly refers to its use through checkpointing.
Hardware Specification	Yes	The experiment runs on 8 NVIDIA Telsa V100 GPUs.
Software Dependencies	No	The paper mentions using 'Adam W as the optimizer' but does not specify versions for any other software dependencies such as programming languages or deep learning frameworks (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	All images used in our experiments are resized to 224x224. We use Adam W as the optimizer. The learning rate starts from 1e-4 and halves every 30 epochs. The experiment runs on 8 NVIDIA Telsa V100 GPUs. The batch size is set to 256, i.e., 32x8, each GPU is allocated with 32 images. We store the checkpoints yielding the best FPR95 performance.