Out-of-Distribution Detection by Leveraging Between-Layer Transformation Smoothness
Authors: Fran Jelenić, Josip Jukić, Martin Tutek, Mate Puljiz, Jan Snajder
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate BLOOD on several text classification tasks with Transformer networks and demonstrate that it outperforms methods with comparable resource requirements. |
| Researcher Affiliation | Academia | 1Take Lab, 2Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia 3UKP Lab, Technical University of Darmstadt, Germany |
| Pseudocode | No | The paper describes the method mathematically and conceptually but does not provide pseudocode or an algorithm block. |
| Open Source Code | Yes | We provide code and data for our experiments.2 2https://github.com/fjelenic/between-layer-ood |
| Open Datasets | Yes | We use eight text classification datasets for ID data: SST-2 (SST; Socher et al., 2013), Subjectivity (SUBJ; Pang & Lee, 2004), AG-News (AGN; Zhang et al., 2015), and TREC (TREC; Li & Roth, 2002), Big Patent (BP; Sharma et al., 2019), Amazon Reviews (AR; Mc Auley et al., 2015), Movie Genre (MG; Maas et al., 2011), 20News Groups (NG; Lang, 1995). We use One Billion Word Benchmark (OBW) (Chelba et al., 2014) for OOD data, similarly to Ovadia et al. (2019), because of the diversity of the corpus. |
| Dataset Splits | Yes | We subsample OOD datasets to be of the same size as their ID test set counterparts. We repeated each experiment with five different random seeds that varied the initialization of the classification head and the stochastic nature of the learning procedure. To obtain the weights for the weighted sum, first we create a validation set from the 5% of our ID and OOD test sets and then fit the logistic regression. |
| Hardware Specification | Yes | We conducted our experiments on 4 AMD Ryzen Threadripper 3970X 32-Core Processors and 2x NVIDIA Ge Force RTX 3090 GPUs with 24GB of RAM, which took a little bit less than three weeks. |
| Software Dependencies | Yes | We used Python 3.8.5, Py Torch (Paszke et al., 2019) version 1.12.1, Hugging Face Transformers (Wolf et al., 2020) version 4.21.3, Hugging Face Datasets (Lhoest et al., 2021) version 2.11.0, scikitlearn (Pedregosa et al., 2011) version 1.2.2, and CUDA 11.4. |
| Experiment Setup | Yes | For fine-tuning we used Adam optimizer (Kingma & Ba, 2015) with β1 = 0.9, β2 = 0.999, ϵ = 10 8, learning rate of 2 10 5. We fine-tuned the models for ten epochs. The batch size depends on the dataset used. We repeated each experiment with five different random seeds that varied the initialization of the classification head and the stochastic nature of the learning procedure. |