Beyond Mahalanobis Distance for Textual OOD Detection

Authors: Pierre Colombo, Eduardo Dadalto, Guillaume Staerman, Nathan Noiry, Pablo Piantanida

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive numerical experiments involve 51k model configurations, including various checkpoints, seeds, and datasets, and demonstrate that TRUSTED achieves state-of-the-art performances. We conduct extensive numerical experiments and prove that our method improves over SOTA methods.
Researcher Affiliation Collaboration Pierre Colombo Mathématiques et Informatique pour la Complexité et les Systèmes Centrale Supelec, Université Paris Saclay pierre.colombo@centralesupelec.fr...Nathan Noiry althiqua.io noirynathan@gmail.com
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes 3. We release open-source code and data to ease future research, ensure reproducibility and reduce computation overhead.
Open Datasets Yes The considered benchmark is composed of three different types of in distribution datasets (referred to as IN-DS) which are used to train the classifiers: sentiment analysis (i.e., SST2 [88] and IMDB [70]), topic classification (i.e., 20Newsgroup [54]) and question answering (i.e., TREC-10 [61]).
Dataset Splits Yes For splitting we use either the standard split or the one provided by [103]. Notice that after 3k iterations models have converged and no over-fitting is observed even after 20k iterations (i.e., we do not observe an increase in validation loss).
Hardware Specification No This work was also granted access to the HPC resources of IDRIS under the allocation 2021AP010611665 as well as under the project 2021-101838 made by GENCI.
Software Dependencies No We trained all models with a dropout rate [89] of 0.2, a batch size of 32, we use ADAMW [55].
Experiment Setup Yes We trained all models with a dropout rate [89] of 0.2, a batch size of 32, we use ADAMW [55]. Additionally, the weight decay is set to 0.01, the warmup ratio is set to 0.06 and the learning rate to 10 5.