On the Powerfulness of Textual Outlier Exposure for Visual OoD Detection
Authors: Sangha Park, Jisoo Mok, Dahuin Jung, Saehyung Lee, Sungroh Yoon
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments demonstrate that generated textual outliers achieve competitive performance on large-scale Oo D and hard Oo D benchmarks. Furthermore, we conduct empirical analyses of textual outliers to provide primary criteria for designing advantageous textual outliers |
| Researcher Affiliation | Academia | Sangha Park1, Jisoo Mok1, Dahuin Jung1, Saehyung Lee1, Sungroh Yoon1,2, 1Department of Electrical and Computer Engineering, Seoul National University 2Interdisciplinary Program in Artificial Intelligence, Seoul National University |
| Pseudocode | No | The paper describes the generation processes with figures and text but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/wiarae/TOE |
| Open Datasets | Yes | We use the large-scale Image Net-1K [6] Oo D detection benchmark proposed by Huang et al. [20]. We conduct experiments on four Oo D test datasets: subsets of i Naturalist [47], SUN [52], Places [58], Texture [4]. |
| Dataset Splits | Yes | Images from the validation set of the ID dataset, which does not overlap with the test set, are used as inputs... The Adam optimizer is used to train our linear classifier for Oo D detection, and batch size, learning rate, and other hyperparameters are tuned on the validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions models and platforms like CLIP, BERT, GPT-3, BLIP-2, and HuggingFace, but does not provide specific version numbers for underlying software dependencies (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | We use 0.5 for λ in our training loss, and batch size for both ID and train time outlier is 32. The temparature value T for Energy score is set to 1, as per the original paper. Model checkpoints with the highest validation accuracy are evaluated on the test set. We employ a value of 30 for k, which is used in word-level outliers filtering, and a value of 25 for δ. Furthermore, a filtering ratio of 15 is utilized as the p for caption-level outlier analysis. |