reproducibilityindex.ai

Beyond Mahalanobis Distance for Textual OOD Detection

Authors: Pierre Colombo, Eduardo Dadalto, Guillaume Staerman, Nathan Noiry, Pablo Piantanida

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive numerical experiments involve 51k model conﬁgurations, including various checkpoints, seeds, and datasets, and demonstrate that TRUSTED achieves state-of-the-art performances. We conduct extensive numerical experiments and prove that our method improves over SOTA methods.
Researcher Affiliation	Collaboration	Pierre Colombo Mathématiques et Informatique pour la Complexité et les Systèmes Centrale Supelec, Université Paris Saclay pierre.colombo@centralesupelec.fr...Nathan Noiry althiqua.io noirynathan@gmail.com
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	3. We release open-source code and data to ease future research, ensure reproducibility and reduce computation overhead.
Open Datasets	Yes	The considered benchmark is composed of three different types of in distribution datasets (referred to as IN-DS) which are used to train the classiﬁers: sentiment analysis (i.e., SST2 [88] and IMDB [70]), topic classiﬁcation (i.e., 20Newsgroup [54]) and question answering (i.e., TREC-10 [61]).
Dataset Splits	Yes	For splitting we use either the standard split or the one provided by [103]. Notice that after 3k iterations models have converged and no over-ﬁtting is observed even after 20k iterations (i.e., we do not observe an increase in validation loss).
Hardware Specification	No	This work was also granted access to the HPC resources of IDRIS under the allocation 2021AP010611665 as well as under the project 2021-101838 made by GENCI.
Software Dependencies	No	We trained all models with a dropout rate [89] of 0.2, a batch size of 32, we use ADAMW [55].
Experiment Setup	Yes	We trained all models with a dropout rate [89] of 0.2, a batch size of 32, we use ADAMW [55]. Additionally, the weight decay is set to 0.01, the warmup ratio is set to 0.06 and the learning rate to 10 5.