reproducibilityindex.ai

On the Powerfulness of Textual Outlier Exposure for Visual OoD Detection

Authors: Sangha Park, Jisoo Mok, Dahuin Jung, Saehyung Lee, Sungroh Yoon

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experiments demonstrate that generated textual outliers achieve competitive performance on large-scale Oo D and hard Oo D benchmarks. Furthermore, we conduct empirical analyses of textual outliers to provide primary criteria for designing advantageous textual outliers
Researcher Affiliation	Academia	Sangha Park1, Jisoo Mok1, Dahuin Jung1, Saehyung Lee1, Sungroh Yoon1,2, 1Department of Electrical and Computer Engineering, Seoul National University 2Interdisciplinary Program in Artificial Intelligence, Seoul National University
Pseudocode	No	The paper describes the generation processes with figures and text but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/wiarae/TOE
Open Datasets	Yes	We use the large-scale Image Net-1K [6] Oo D detection benchmark proposed by Huang et al. [20]. We conduct experiments on four Oo D test datasets: subsets of i Naturalist [47], SUN [52], Places [58], Texture [4].
Dataset Splits	Yes	Images from the validation set of the ID dataset, which does not overlap with the test set, are used as inputs... The Adam optimizer is used to train our linear classifier for Oo D detection, and batch size, learning rate, and other hyperparameters are tuned on the validation set.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions models and platforms like CLIP, BERT, GPT-3, BLIP-2, and HuggingFace, but does not provide specific version numbers for underlying software dependencies (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	We use 0.5 for λ in our training loss, and batch size for both ID and train time outlier is 32. The temparature value T for Energy score is set to 1, as per the original paper. Model checkpoints with the highest validation accuracy are evaluated on the test set. We employ a value of 30 for k, which is used in word-level outliers filtering, and a value of 25 for δ. Furthermore, a filtering ratio of 15 is utilized as the p for caption-level outlier analysis.