Evaluations and Methods for Explanation through Robustness Analysis
Authors: Cheng-Yu Hsieh, Chih-Kuan Yeh, Xuanqing Liu, Pradeep Kumar Ravikumar, Seungyeon Kim, Sanjiv Kumar, Cho-Jui Hsieh
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments across multiple domains and a user study, we validate the usefulness of our evaluation criteria and our derived explanations. 5 EXPERIMENTS |
| Researcher Affiliation | Collaboration | 1Paul G. Allen School of Computer Science, University of Washington 2Machine Learning Department, Carnegie Mellon University 3Department of Computer Science, UCLA 4Google Research |
| Pseudocode | No | The paper describes greedy algorithms in Section 4.1 ('GREEDY ALGORITHM TO COMPUTE OPTIMAL EXPLANATIONS') and 4.2 ('GREEDY BY SET AGGREGATION SCORE') but does not present them in a structured pseudocode or algorithm block. |
| Open Source Code | Yes | Code available at https://github.com/Cheng Yu Hsieh/explanation_robustness. |
| Open Datasets | Yes | We perform the experiments on two image datasets, MNIST Le Cun et al. (2010) and Image Net (Deng et al., 2009), as well as a text classification dataset, Yahoo! Answers (Zhang et al., 2015). |
| Dataset Splits | Yes | The training and testing split used in the experiments are the default split as provided by the original dataset. |
| Hardware Specification | Yes | All the experiments were performed on Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz and NVIDIA Ge Force GTX 1080 Ti GPU. |
| Software Dependencies | No | The paper mentions using 'Pytorch library' and 'the official GLUE repository', but does not specify their version numbers. |
| Experiment Setup | Yes | In the experiments, we set the PGD attack step size to be 1.0 and number of steps to be 100. The hyperparameters are chosen such that the PGD attack could most efficiently provide the tightest upper bound on the true robustness value. As mentioned in Section 4.2, we solve Eqn. 6 by subsampling from all possible subsets of Str. Specifically, we compute the coefficients w with respect to 5000 sampled subsets when learning the regression. For all quantitative results, we report the average over 100 random examples. Following common setup (Sundararajan et al., 2017; Ancona et al., 2018), we use zero as the reference value for all explanations that require baseline. |