reproducibilityindex.ai

Towards In-Distribution Compatible Out-of-Distribution Detection

Authors: Boxi Wu, Jie Jiang, Haidong Ren, Zifan Du, Wenxiao Wang, Zhifeng Li, Deng Cai, Xiaofei He, Binbin Lin, Wei Liu

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test ICE on benchmark datasets in Section 5. In the most challenging case where CIFAR10 (Krizhevsky 2012) serves as the in-distribution set and CIFAR100 as the out-of-distribution set, ICE improves the FPR95 score to 22.36% with an improvement of 3.79% over previous sota results. Meanwhile, ICE achieves high in-distribution accuracy from 95.11% to 96.38% over the plain in-distribution training.
Researcher Affiliation	Collaboration	1State Key Lab of CAD&CG, Zhejiang University. 2Tencent Data Platform. 3 School of Software Technology, Zhejiang University. 4Ningbo Zhoushan Port Group Co.,Ltd., Ningbo, China.
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not include any explicit statement or link for open-source code availability for the described methodology.
Open Datasets	Yes	For the in-distribution set, we choose the standard CIFAR10 and CIFAR100 (Krizhevsky 2012) as our major verification target. For out-of-distribution set, we adopt several commonly-used benchmarks, including Textures (Cimpoi et al. 2014), SVHN (Netzer et al. 2011), Places365 (Zhou et al. 2018), LSUN (Yu et al. 2015), and i SUN (Xu et al. 2015). We also use CIFAR100 as an OOD source to evaluate models learned on CIFAR10 and vice versa. We choose the 80 Million Tiny Images (Torralba, Fergus, and Freeman 2008) as the Doe out.
Dataset Splits	No	The paper mentions using standard datasets like CIFAR10 and CIFAR100, but it does not explicitly provide details about specific training/validation/test dataset splits used for reproducibility, such as percentages or sample counts for each split, nor does it explicitly state the use of a validation set.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running the experiments.
Software Dependencies	No	The paper mentions various algorithms and models (e.g., deep neural network, softmax cross-entropy, t-SNE algorithm) but does not provide specific software names with version numbers for dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Fine-tune: We initialize the model with a pre-trained checkpoint on the indistribution set and then fine-tune the model with ten epochs. We adopt a cosine decay learning rate schedule with the initial value of 0.01. From-scratch: We train deep networks with both Dtrain in and Doe out for 100 epochs. The initial learning rate is set to 0.1 with the commonly-used stair-wise decay learning rate schedule. For both protocols, the batch size for Dtrain in is set to 128, and Doe out to 256.