How do Large Language Models Handle Multilingualism?
Authors: Yiran Zhao, Wenxuan Zhang, Guizhen Chen, Kenji Kawaguchi, Lidong Bing
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using PLND, we validate MWork through extensive experiments involving the deactivation of language-specific neurons across various layers and structures. |
| Researcher Affiliation | Collaboration | Yiran Zhao1,2 Wenxuan Zhang2,3 Guizhen Chen2,4 Kenji Kawaguchi1 Lidong Bing2,3 1 National University of Singapore 2 DAMO Academy, Alibaba Group, Singapore 3 Hupan Lab, 310023, Hangzhou, China 4 Nanyang Technological University, Singapore |
| Pseudocode | No | In essence, PLND identifies neurons crucial for handling individual documents, with language-specific neurons being those that consistently show high importance when processing documents in a particular language. |
| Open Source Code | Yes | 1Our code is available at https://github.com/DAMO-NLP-SG/multilingual_analysis |
| Open Datasets | Yes | Employing various benchmark tasks, including XQu AD (Artetxe et al., 2020) for understanding, MGSM (Shi et al., 2022) for reasoning, X-CSQA (Lin et al., 2021) for knowledge extraction, and XLSum for generation (Hasan et al., 2021) |
| Dataset Splits | No | We adopt the performance on XQu AD in Chinese as the validation set to all languages and all tasks. Specifically, Table 12 shows the result on Vicuna when deactivating language-specific neurons in the understanding layer (DU) and generation layer (DG), where N1 is the number of understanding layers and N2 is the number of generation layer. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, memory) are provided in the paper. |
| Software Dependencies | No | We employ cld3 package to detect the language of each token in the embeddings of each layer, which is a language detection library based on the Compact Language Detector 3 model developed by Google. |
| Experiment Setup | Yes | Deactivation Strategy We primarily consider two aspects when selecting the deactivation settings: (1) language-specific neurons versus randomly chosen neurons, and (2) the position of neurons, which encompasses four structures. |