reproducibilityindex.ai

PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition

Authors: Jinghui Lu, Yanjie Wang, Ziwei Yang, Xuejing Liu, Brian Mac Namee, Can Huang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments reveal that Pa De LLM-NER significantly increases inference speed that is 1.76 to 10.22 times faster than the autoregressive approach for both English and Chinese. Concurrently, it maintains the prediction quality as evidenced by the micro F-score that is on par with the state-of-the-art approaches under both zero-shot and supervised setting.
Researcher Affiliation	Collaboration	1 Byte Dance 2 University of Chinese Academy of Sciences, China 3 School of Computer Science, University College Dublin
Pseudocode	No	The paper includes figures illustrating training and inference paradigms (Figure 1 and Figure 2) but does not provide formal pseudocode or algorithm blocks.
Open Source Code	Yes	All resources are available at https: //github.com/George Lu Immortal/Pa De LLM_NER.
Open Datasets	Yes	The datasets used in our experiments include: Zero-shot Datasets: To align with the methodology proposed by [44], we train Pa De LLM using the Pile-NER dataset [45]. This dataset comprises around 240,000 entities categorized into 13,000 distinct types, derived from the Pile Corpus [46]. Supervised Datasets: we evaluate our method on supervised English and Chinese NER datasets. Following [30, 49, 50], English datasets include the general domain flat NER Co NLL2003 [51], the nested NER ACE2005 [52], and the biomedical nested NER GENIA [53]. Following [6, 54, 55], Chinese datasets include four commonly used general domain flat NER benchmarks Resume [56], Weibo [57], MSRA [58] and Ontonotes 4.0 [59] and two vertical industrial domain flat NER datasets You Ku [60] and Ecommerce [61].
Dataset Splits	Yes	Table 15: Dataset Statistics. # denotes the amount. For MSRA, we remove four outlier examples in test set. ... Co NLL2003 20,744 14,041 3,250 3,453 35,089 23,499 5,942 5,648 ... ACE2005 9,210 7,194 969 1,047 ... GENIA 18,546 15,023 1,669 1,854 ...
Hardware Specification	Yes	Evaluations of all models were performed on the same NVIDIA A100 GPU. ... Training is conducted on 8 NVIDIA A100 GPUs.
Software Dependencies	No	The paper mentions using Llama2-7b and Baichuan2-7b as base models, along with AdamW optimizer and cosine scheduler, but it does not specify software versions for libraries like Python, PyTorch, or CUDA, which are necessary for full reproducibility of the environment.
Experiment Setup	Yes	We train our model on all datasets for 4 epochs, using a batch size of 128 and a learning rate of 1e 5, with the Adam W optimizer [70] and a cosine scheduler [71]. The maximum input and output sequence lengths are set to 2048 and 512, respectively. ... For all generative models, we use greedy search with a beam size of 1, a maximum of 512 new tokens, and a temperature of 1.0.