Out-of-Distribution Detection by Leveraging Between-Layer Transformation Smoothness

Authors: Fran Jelenić, Josip Jukić, Martin Tutek, Mate Puljiz, Jan Snajder

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate BLOOD on several text classification tasks with Transformer networks and demonstrate that it outperforms methods with comparable resource requirements.
Researcher Affiliation Academia 1Take Lab, 2Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia 3UKP Lab, Technical University of Darmstadt, Germany
Pseudocode No The paper describes the method mathematically and conceptually but does not provide pseudocode or an algorithm block.
Open Source Code Yes We provide code and data for our experiments.2 2https://github.com/fjelenic/between-layer-ood
Open Datasets Yes We use eight text classification datasets for ID data: SST-2 (SST; Socher et al., 2013), Subjectivity (SUBJ; Pang & Lee, 2004), AG-News (AGN; Zhang et al., 2015), and TREC (TREC; Li & Roth, 2002), Big Patent (BP; Sharma et al., 2019), Amazon Reviews (AR; Mc Auley et al., 2015), Movie Genre (MG; Maas et al., 2011), 20News Groups (NG; Lang, 1995). We use One Billion Word Benchmark (OBW) (Chelba et al., 2014) for OOD data, similarly to Ovadia et al. (2019), because of the diversity of the corpus.
Dataset Splits Yes We subsample OOD datasets to be of the same size as their ID test set counterparts. We repeated each experiment with five different random seeds that varied the initialization of the classification head and the stochastic nature of the learning procedure. To obtain the weights for the weighted sum, first we create a validation set from the 5% of our ID and OOD test sets and then fit the logistic regression.
Hardware Specification Yes We conducted our experiments on 4 AMD Ryzen Threadripper 3970X 32-Core Processors and 2x NVIDIA Ge Force RTX 3090 GPUs with 24GB of RAM, which took a little bit less than three weeks.
Software Dependencies Yes We used Python 3.8.5, Py Torch (Paszke et al., 2019) version 1.12.1, Hugging Face Transformers (Wolf et al., 2020) version 4.21.3, Hugging Face Datasets (Lhoest et al., 2021) version 2.11.0, scikitlearn (Pedregosa et al., 2011) version 1.2.2, and CUDA 11.4.
Experiment Setup Yes For fine-tuning we used Adam optimizer (Kingma & Ba, 2015) with β1 = 0.9, β2 = 0.999, ϵ = 10 8, learning rate of 2 10 5. We fine-tuned the models for ten epochs. The batch size depends on the dataset used. We repeated each experiment with five different random seeds that varied the initialization of the classification head and the stochastic nature of the learning procedure.