Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Glyce: Glyph-vectors for Chinese Character Representations
Authors: Yuxian Meng, Wei Wu, Fei Wang, Xiaoya Li, Ping Nie, Fan Yin, Muyu Li, Qinghong Han, Xiaofei Sun, Jiwei Li
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that glyph-based models are able to consistently outperform word/char ID-based models in a wide range of Chinese NLP tasks. We are able to set new stateof-the-art results for a variety of Chinese NLP tasks, including tagging (NER, CWS, POS), sentence pair classification, single sentence classification tasks, dependency parsing, and semantic role labeling. |
| Researcher Affiliation | Industry | Shannon.AI EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | code is available at https://github.com/Shannon AI/glyce. |
| Open Datasets | Yes | NER For the task of Chinese NER, we used the widely-used Onto Notes, MSRA, Weibo and resume datasets. We used the widely-used PKU, MSR, CITYU and AS benchmarks from SIGHAN 2005 bake-off for evaluation. We use the CTB5, CTB9 and UD1 (Universal Dependencies) benchmarks to test our models. We employ the following four different datasets: (1) BQ (binary classification task) [Bowman et al., 2015]; (2) LCQMC (binary classification task) [Liu et al., 2018], (3) XNLI (three-class classification task) [Williams and Bowman], and (4) NLPCC-DBQA. Datasets that we use include: (1) Chn Senti Corp (binary classification); (2) the Fudan corpus (5-class classification) [Li, 2011]; and (3) Ifeng (5-class classification). For dependency parsing [Chen and Manning, 2014, Dyer et al., 2015], we used the widely-used Chinese Penn Treebank 5.1 dataset for evaluation. For the task of semantic role labeling (SRL) [Roth and Lapata, 2016, Marcheggiani and Titov, 2017, He et al., 2018], we used the Co NLL-2009 shared-task. |
| Dataset Splits | Yes | To enable apples-to-apples comparison, we perform grid parameter search for both baselines and the proposed model on the dev set. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for running its experiments, such as specific GPU or CPU models. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers, such as programming languages, libraries, or frameworks used for implementation. |
| Experiment Setup | No | The paper mentions performing a 'grid parameter search' but does not provide concrete hyperparameter values or detailed training configurations within the main text. |