reproducibilityindex.ai

LocMoE: A Low-overhead MoE for Large Language Model Training

Authors: Jing Li, Zhijie Sun, Xuan He, Li Zeng, Yi Lin, Entong Li, Binfan Zheng, Rongqian Zhao, Xin Chen

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experiment results demonstrate that the proposed Loc Mo E reduces training time per epoch by 12.68% to 22.24% compared to classical routers, such as hash router and switch router, without impacting the model accuracy.
Researcher Affiliation	Industry	Jing Li , Zhijie Sun , , Xuan He , Li Zeng , Yi Lin , Entong Li , Binfan Zheng , Rongqian Zhao and Xin Chen Huawei Technologies Co., Ltd {lijing473, sunzhijie3, hexuan22, zengli43, linyi11, lientong, zhengbinfan1, zhaorongqian, chenxin}@huawei.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets	No	Appendix B describes the dataset used: "The materials connected to mobile network operators services are chosen as input corpora. Concretely, blogs and technical documents in the form of i Case, Wiki, core network/Man Machine language (MML), configuration translations, feature documents, etc., are collected. These corpora are in Chinese, English, or bilingual (Chinese-English)." However, it does not provide concrete access information (specific link, DOI, repository name, formal citation with authors/year) for the dataset to be publicly available or open.
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. It mentions "valid perplexity" which implies a validation set was used, but no details on the split.
Hardware Specification	Yes	We conduct experiments on the Ascend cluster groups (see environment configuration in Appendix C). The Ascend 910A series NPU has 32 AI Cores, with a maximum memory capacity of 2TB and a maximum memory bandwidth of 1.07TB/s. The Ascend 910A chip delivers 320 Tera FLOPS at semi-precision (FP16) and 640 Tera OPS at integer precision (INT8).
Software Dependencies	Yes	Our model runs on the Mind Spore framework with version 2.0.0. The versions of the Compute Architecture for Neural Networks (CANN) suite (toolkit, CANN, driver) are 5.1.RC2.1, 1.84, and 23.0.rc2, respectively.
Experiment Setup	Yes	The hyperparameter configuration of our model is listed in Table 1. Thereinto, batch size and sink size are relevant to the number of devices, and the values in the table are under 128N. The total number of experts can be obtained by expert num per dp dim * expert parallel.