reproducibilityindex.ai

On Guaranteed Optimal Robust Explanations for NLP Models

Authors: Emanuele La Malfa, Rhiannon Michelmore, Agnieszka M. Zbrzezny, Nicola Paoletti, Marta Kwiatkowska

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our framework on three widely used sentiment analysis tasks and texts of up to 100 words from SST, Twitter and IMDB datasets, demonstrating the effectiveness of the derived explanations1.
Researcher Affiliation	Academia	1University of Oxford 2University of Warmia and Mazury, Olsztyn 3Royal Holloway, University of London
Pseudocode	Yes	A more detailed discussion (including the pseudo-code) is available in the supplement.
Open Source Code	Yes	Code available at https://github.com/EmanueleLM/OREs
Open Datasets	Yes	We considered 3 well-established benchmarks for sentiment analysis, namely SST [Socher et al., 2013], Twitter [Go et al., 2009] and IMDB [Maas et al., 2011] datasets.
Dataset Splits	No	From these, we have chosen 40 representative input texts, balancing positive and negative examples.
Hardware Specification	Yes	Experiments were parallelized on a server with two 24-core Intel Xenon 6252 processors and 256GB of RAM, but each instance is single-threaded and can be executed on a low-end laptop.
Software Dependencies	No	Both the HS and MSA algorithms have been implemented in Python and use Marabou [Katz et al., 2019] and Neurify [Wang et al., 2018] to answer robustness queries.
Experiment Setup	Yes	In the experiments below, we opted for the k NNbox perturbation space, as we found that the k parameter was easier to interpret and tune than the ϵ parameter for the ϵ-ball space, and improved veriﬁcation time. (e.g., Figures 2, 3, 4, 7 specify k=15, k=25, k=8, k=10 for kNN boxes)