How do Large Language Models Handle Multilingualism?

Authors: Yiran Zhao, Wenxuan Zhang, Guizhen Chen, Kenji Kawaguchi, Lidong Bing

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using PLND, we validate MWork through extensive experiments involving the deactivation of language-specific neurons across various layers and structures.
Researcher Affiliation Collaboration Yiran Zhao1,2 Wenxuan Zhang2,3 Guizhen Chen2,4 Kenji Kawaguchi1 Lidong Bing2,3 1 National University of Singapore 2 DAMO Academy, Alibaba Group, Singapore 3 Hupan Lab, 310023, Hangzhou, China 4 Nanyang Technological University, Singapore
Pseudocode No In essence, PLND identifies neurons crucial for handling individual documents, with language-specific neurons being those that consistently show high importance when processing documents in a particular language.
Open Source Code Yes 1Our code is available at https://github.com/DAMO-NLP-SG/multilingual_analysis
Open Datasets Yes Employing various benchmark tasks, including XQu AD (Artetxe et al., 2020) for understanding, MGSM (Shi et al., 2022) for reasoning, X-CSQA (Lin et al., 2021) for knowledge extraction, and XLSum for generation (Hasan et al., 2021)
Dataset Splits No We adopt the performance on XQu AD in Chinese as the validation set to all languages and all tasks. Specifically, Table 12 shows the result on Vicuna when deactivating language-specific neurons in the understanding layer (DU) and generation layer (DG), where N1 is the number of understanding layers and N2 is the number of generation layer.
Hardware Specification No No specific hardware details (like GPU/CPU models, memory) are provided in the paper.
Software Dependencies No We employ cld3 package to detect the language of each token in the embeddings of each layer, which is a language detection library based on the Compact Language Detector 3 model developed by Google.
Experiment Setup Yes Deactivation Strategy We primarily consider two aspects when selecting the deactivation settings: (1) language-specific neurons versus randomly chosen neurons, and (2) the position of neurons, which encompasses four structures.