On the Pitfalls of Analyzing Individual Neurons in Language Models

Authors: Omer Antverg, Yonatan Belinkov

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experiment with disentangling probe quality and ranking quality, by using a probe from one method with a ranking from another method, and comparing the different probe ranking combinations. We primarily experiment with the M-BERT model (Devlin et al., 2019) on 9 languages and 13 morphological attributes, from the Universal Dependencies dataset (Zeman et al., 2020). We also experiment with XLM-R (Conneau et al., 2020), and find that most of our results are similar between the models, with a few differences which we discuss. Our experiments reveal the following insights:
Researcher Affiliation Academia Omer Antverg Technion Israel Institute of Technology omer.antverg@cs.technion.ac.il; Yonatan Belinkov Technion Israel Institute of Technology belinkov@technion.ac.il
Pseudocode No The paper describes methods with mathematical formulas and textual explanations but does not include structured pseudocode blocks or algorithms.
Open Source Code Yes Our code is available at: https://github.com/technion-cs-nlp/Individual-Neurons Pitfalls
Open Datasets Yes We primarily experiment with the M-BERT model (Devlin et al., 2019) on 9 languages and 13 morphological attributes, from the Universal Dependencies dataset (Zeman et al., 2020).
Dataset Splits Yes We perform a sweep search on the values of β in the range [1, 12] on a dev set.
Hardware Specification Yes We performed our experiments on NVIDIA RTX 2080 Ti GPU.
Software Dependencies No As language models, we used the implementation of the transformers library (Wolf et al., 2020). The paper does not provide specific version numbers for software dependencies.
Experiment Setup Yes We use the hyperparameters reported in Durrani et al. (2020) and Torroba Hennigen et al. (2020) for training the classifiers. For an increasing k N, we train a classifier f : Hk Z to predict the task label, F(h), solely from hΠ(d)[k] (the subvector of the representation h in the top k neurons in ranking Π), ignoring the rest of the neurons. We find β = 8 to be a balanced point, and thus report test results with β = 8 in three configs in Table 1, and the rest of the configs in Appendix A.11.