reproducibilityindex.ai

Unsupervised Layer-Wise Score Aggregation for Textual OOD Detection

Authors: Maxime Darrin, Guillaume Staerman, Eduardo Dadalto Camara Gomes, Jackie C. K. Cheung, Pablo Piantanida, Pierre Colombo

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On this augmented benchmark, we show that the proposed post-aggregation methods achieve robust and consistent results comparable to using the best layer according to an oracle while removing manual feature selection altogether.
Researcher Affiliation	Academia	1International Laboratory on Learning Systems, 2MILA Quebec AI Institute, 3Mc Gill University, 4Universit e Paris-Saclay, 5 Laboratoire signaux et syst emes, 6CNRS, 7Centrale Sup elec, 8 Canada CIFAR AI Chair, Mila, 9 INRIA, CEA, Paris 10 Equal, Paris, 11 MICS
Pseudocode	No	The paper describes its framework in 4 steps and refers to Figure 2 for a depiction, but it does not include formal pseudocode blocks or algorithms.
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for the methodology described.
Open Datasets	Yes	English benchmark. We relied on the benchmark proposed by Zhou, Liu, and Chen (2021); Hendrycks et al. (2020). It features three types of IN-DS: sentiment analysis (i.e., SST2 (Socher et al. 2013), IMDB (Maas et al. 2011)), topic classification (i.e., 20Newsgroup (Joachims 1996)) and question answering (i.e., TREC-10 and TREC-50 (Li and Roth 2002)). We also included the Massive (Fitz Gerald et al. 2022) dataset and the Banking (Casanueva et al. 2020) for a larger number of classes and NLI datasets (i.e., RTE (Burger and Ferro 2005; Hickl et al. 2006) and MNLI (Williams, Nangia, and Bowman 2018)).
Dataset Splits	No	The paper states 'Following standard protocol (Hendrycks et al. 2020), we train a classifier for each in-distribution dataset (IN-DS)...' but does not explicitly provide specific percentages or counts for training, validation, or test splits. The details of the 'standard protocol' are not given within the paper text itself.
Hardware Specification	No	The paper states: 'This work was performed using HPC resources from GENCI IDRIS (Grant 2022-AD011013945). This work was supprted by HPC resources of CINES and GENCI. The authors would like to thank the staff of CINES for technical support in managing the Adastra GPU cluster...' While mentioning 'GPU cluster', it does not provide specific details like GPU model numbers, CPU type, or memory, which are required for full reproducibility.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers, such as Python versions or library versions (e.g., PyTorch, TensorFlow, scikit-learn).
Experiment Setup	No	The paper does not explicitly provide details about the experimental setup such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or specific training configurations. It mentions fine-tuning models but not the parameters used for fine-tuning.