On Guaranteed Optimal Robust Explanations for NLP Models
Authors: Emanuele La Malfa, Rhiannon Michelmore, Agnieszka M. Zbrzezny, Nicola Paoletti, Marta Kwiatkowska
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our framework on three widely used sentiment analysis tasks and texts of up to 100 words from SST, Twitter and IMDB datasets, demonstrating the effectiveness of the derived explanations1. |
| Researcher Affiliation | Academia | 1University of Oxford 2University of Warmia and Mazury, Olsztyn 3Royal Holloway, University of London |
| Pseudocode | Yes | A more detailed discussion (including the pseudo-code) is available in the supplement. |
| Open Source Code | Yes | Code available at https://github.com/EmanueleLM/OREs |
| Open Datasets | Yes | We considered 3 well-established benchmarks for sentiment analysis, namely SST [Socher et al., 2013], Twitter [Go et al., 2009] and IMDB [Maas et al., 2011] datasets. |
| Dataset Splits | No | From these, we have chosen 40 representative input texts, balancing positive and negative examples. |
| Hardware Specification | Yes | Experiments were parallelized on a server with two 24-core Intel Xenon 6252 processors and 256GB of RAM, but each instance is single-threaded and can be executed on a low-end laptop. |
| Software Dependencies | No | Both the HS and MSA algorithms have been implemented in Python and use Marabou [Katz et al., 2019] and Neurify [Wang et al., 2018] to answer robustness queries. |
| Experiment Setup | Yes | In the experiments below, we opted for the k NNbox perturbation space, as we found that the k parameter was easier to interpret and tune than the ϵ parameter for the ϵ-ball space, and improved verification time. (e.g., Figures 2, 3, 4, 7 specify k=15, k=25, k=8, k=10 for kNN boxes) |