Evaluation of Similarity-based Explanations
Authors: Kazuaki Hanawa, Sho Yokoi, Satoshi Hara, Kentaro Inui
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments revealed that the cosine similarity of the gradients of the loss performs best, which would be a recommended choice in practice. ... For this evaluation, we used two image datasets (MNIST (Le Cun et al., 1998), CIFAR10 (Krizhevsky, 2009)), two text datasets (TREC (Li & Roth, 2002), AGNews (Zhang et al., 2015)) and two table datasets (Vehicle (Dua & Graff, 2017), Segment (Dua & Graff, 2017)). |
| Researcher Affiliation | Academia | Kazuaki Hanawa1,2, Sho Yokoi2,1, Satoshi Hara3, Kentaro Inui2,1 RIKEN Center for Advanced Intelligence Project1, Tohoku University2, Osaka University3 |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. Procedures are described in narrative text. |
| Open Source Code | Yes | Our implementation is available at https://github.com/k-hanawa/criteria_for_instance_based_explanation |
| Open Datasets | Yes | For this evaluation, we used two image datasets (MNIST (Le Cun et al., 1998), CIFAR10 (Krizhevsky, 2009)), two text datasets (TREC (Li & Roth, 2002), AGNews (Zhang et al., 2015)) and two table datasets (Vehicle (Dua & Graff, 2017), Segment (Dua & Graff, 2017)). |
| Dataset Splits | No | The paper mentions training on a subset of training instances and then sampling test instances ('randomly sample 500 test instances from the test set'), but does not explicitly describe a separate validation set split for hyperparameter tuning or model selection. |
| Hardware Specification | Yes | In our experiments, training of the models was run on a NVIDIA GTX 1080 GPU with Intel Xeon Silver 4112 CPU and 64GB RAM. Testing and computing relevance metrics were run on Xeon E5-2680 v2 CPU with 256GB RAM. |
| Software Dependencies | No | The paper mentions using the 'Adam optimizer' and specific types of models (CNN, Bi-LSTM, logistic regression), but it does not specify software components with version numbers (e.g., Python, PyTorch, TensorFlow, or CUDA versions) required to reproduce the experiments. |
| Experiment Setup | Yes | We trained the models using the Adam optimizer with a learning rate of 0.001. |