Evaluating Neuron Interpretation Methods of NLP Models
Authors: Yimin Fan, Fahim Dalvi, Nadir Durrani, Hassan Sajjad
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We rigorously assess our framework across a diverse array of neuron interpretation methods. Notable findings include: i) despite the theoretical differences among the methods, neuron ranking methods share over 60% of their rankings when identifying salient neurons, ii) the neuron interpretation methods are most sensitive to the last layer representations, iii) Probeless neuron ranking emerges as the most consistent method. |
| Researcher Affiliation | Academia | Yimin Fan The Chinese University of Hong Kong, Hong Kong, China Fahim Dalvi Nadir Durrani Hassan Sajjad Qatar Computing Research Institute, HBKU, Qatar Faculty of Computer Science, Dalhousie University, Canada |
| Pseudocode | No | The paper describes methods using mathematical equations and formal definitions but does not include any blocks explicitly labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | 1https://github.com/fdalvi/neuron-comparative-analysis |
| Open Datasets | Yes | We consider concepts from three linguistic tasks: parts of speech tags (POS, Marcus et al., 1993), semantic tags (SEM, Abzianidze et al., 2017) and syntactic chunking (Chunking) using Co NLL 2000 shared task dataset (Tjong Kim Sang & Buchholz, 2000). |
| Dataset Splits | Yes | We split the binary classification dataset into train/dev/test splits of 70/15/15 percent. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions the 'Neuro X toolkit Dalvi et al. (2023)' and various regularization techniques (Lasso, Ridge, Elastic Net) and classifiers, but it does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | We use λ1 = 0.01 and λ2 = 0.01 for regularization-based methods. We consider s = 10, 30, 50 which covers a diverse range to generalize the findings. |