Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Small Language Model Makes an Effective Long Text Extractor
Authors: Yelin Chen, Fanjin Zhang, Jie Tang
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our method achieves state-of-the-art extraction accuracy on three long NER datasets and is capable of extracting entities from long texts in a GPU-memory-friendly manner. |
| Researcher Affiliation | Academia | Yelin Chen1* , Fanjin Zhang2* , Jie Tang2 1School of Computer Science and Technology, Xinjiang University, Urumqi 830049, China 2Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology using prose and mathematical equations but does not contain a dedicated section or figure labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | Code https://github.com/THUDM/scholarprofiling/tree/main/sener |
| Open Datasets | Yes | We conduct experiments on three NER datasets: Scholar XL (Zhang et al. 2024), Sci REX (Jain et al. 2020), and Profiling-07 (Tang, Zhang, and Yao 2007; Tang et al. 2008). |
| Dataset Splits | No | Hyper-parameters are selected based on the F1 score on the validation set. |
| Hardware Specification | Yes | All experiments are conducted on an 8-card 80G Nvidia A100 server. |
| Software Dependencies | No | We choose De BERTa-V3-large (He, Gao, and Chen 2023) as the PLM for span-based methods and Diffusion NER. We use Adam W (Loshchilov, Hutter et al. 2017) optimizer with a weight decay of 1e 2. |
| Experiment Setup | Yes | We use Adam W (Loshchilov, Hutter et al. 2017) optimizer with a weight decay of 1e 2. The unilateral window sizes of the arrow attention and Bi SPA mechanism are both set to 128. We only use low-rank adaptation on the Q and V matrix of the self-attention mechanism with a rank of 8. |