Evaluating Neuron Interpretation Methods of NLP Models

Authors: Yimin Fan, Fahim Dalvi, Nadir Durrani, Hassan Sajjad

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We rigorously assess our framework across a diverse array of neuron interpretation methods. Notable findings include: i) despite the theoretical differences among the methods, neuron ranking methods share over 60% of their rankings when identifying salient neurons, ii) the neuron interpretation methods are most sensitive to the last layer representations, iii) Probeless neuron ranking emerges as the most consistent method.
Researcher Affiliation Academia Yimin Fan The Chinese University of Hong Kong, Hong Kong, China Fahim Dalvi Nadir Durrani Hassan Sajjad Qatar Computing Research Institute, HBKU, Qatar Faculty of Computer Science, Dalhousie University, Canada
Pseudocode No The paper describes methods using mathematical equations and formal definitions but does not include any blocks explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code Yes 1https://github.com/fdalvi/neuron-comparative-analysis
Open Datasets Yes We consider concepts from three linguistic tasks: parts of speech tags (POS, Marcus et al., 1993), semantic tags (SEM, Abzianidze et al., 2017) and syntactic chunking (Chunking) using Co NLL 2000 shared task dataset (Tjong Kim Sang & Buchholz, 2000).
Dataset Splits Yes We split the binary classification dataset into train/dev/test splits of 70/15/15 percent.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions the 'Neuro X toolkit Dalvi et al. (2023)' and various regularization techniques (Lasso, Ridge, Elastic Net) and classifiers, but it does not specify any software dependencies with version numbers.
Experiment Setup Yes We use λ1 = 0.01 and λ2 = 0.01 for regularization-based methods. We consider s = 10, 30, 50 which covers a diverse range to generalize the findings.