On the Pitfalls of Analyzing Individual Neurons in Language Models
Authors: Omer Antverg, Yonatan Belinkov
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experiment with disentangling probe quality and ranking quality, by using a probe from one method with a ranking from another method, and comparing the different probe ranking combinations. We primarily experiment with the M-BERT model (Devlin et al., 2019) on 9 languages and 13 morphological attributes, from the Universal Dependencies dataset (Zeman et al., 2020). We also experiment with XLM-R (Conneau et al., 2020), and find that most of our results are similar between the models, with a few differences which we discuss. Our experiments reveal the following insights: |
| Researcher Affiliation | Academia | Omer Antverg Technion Israel Institute of Technology omer.antverg@cs.technion.ac.il; Yonatan Belinkov Technion Israel Institute of Technology belinkov@technion.ac.il |
| Pseudocode | No | The paper describes methods with mathematical formulas and textual explanations but does not include structured pseudocode blocks or algorithms. |
| Open Source Code | Yes | Our code is available at: https://github.com/technion-cs-nlp/Individual-Neurons Pitfalls |
| Open Datasets | Yes | We primarily experiment with the M-BERT model (Devlin et al., 2019) on 9 languages and 13 morphological attributes, from the Universal Dependencies dataset (Zeman et al., 2020). |
| Dataset Splits | Yes | We perform a sweep search on the values of β in the range [1, 12] on a dev set. |
| Hardware Specification | Yes | We performed our experiments on NVIDIA RTX 2080 Ti GPU. |
| Software Dependencies | No | As language models, we used the implementation of the transformers library (Wolf et al., 2020). The paper does not provide specific version numbers for software dependencies. |
| Experiment Setup | Yes | We use the hyperparameters reported in Durrani et al. (2020) and Torroba Hennigen et al. (2020) for training the classifiers. For an increasing k N, we train a classifier f : Hk Z to predict the task label, F(h), solely from hΠ(d)[k] (the subvector of the representation h in the top k neurons in ranking Π), ignoring the rest of the neurons. We find β = 8 to be a balanced point, and thus report test results with β = 8 in three configs in Table 1, and the rest of the configs in Appendix A.11. |