Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Evaluating Neuron Interpretation Methods of NLP Models
Authors: Yimin Fan, Fahim Dalvi, Nadir Durrani, Hassan Sajjad
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We rigorously assess our framework across a diverse array of neuron interpretation methods. Notable findings include: i) despite the theoretical differences among the methods, neuron ranking methods share over 60% of their rankings when identifying salient neurons, ii) the neuron interpretation methods are most sensitive to the last layer representations, iii) Probeless neuron ranking emerges as the most consistent method. |
| Researcher Affiliation | Academia | Yimin Fan The Chinese University of Hong Kong, Hong Kong, China Fahim Dalvi Nadir Durrani Hassan Sajjad Qatar Computing Research Institute, HBKU, Qatar Faculty of Computer Science, Dalhousie University, Canada |
| Pseudocode | No | The paper describes methods using mathematical equations and formal definitions but does not include any blocks explicitly labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | 1https://github.com/fdalvi/neuron-comparative-analysis |
| Open Datasets | Yes | We consider concepts from three linguistic tasks: parts of speech tags (POS, Marcus et al., 1993), semantic tags (SEM, Abzianidze et al., 2017) and syntactic chunking (Chunking) using Co NLL 2000 shared task dataset (Tjong Kim Sang & Buchholz, 2000). |
| Dataset Splits | Yes | We split the binary classification dataset into train/dev/test splits of 70/15/15 percent. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions the 'Neuro X toolkit Dalvi et al. (2023)' and various regularization techniques (Lasso, Ridge, Elastic Net) and classifiers, but it does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | We use λ1 = 0.01 and λ2 = 0.01 for regularization-based methods. We consider s = 10, 30, 50 which covers a diverse range to generalize the findings. |