Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Small Language Model Makes an Effective Long Text Extractor

Authors: Yelin Chen, Fanjin Zhang, Jie Tang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our method achieves state-of-the-art extraction accuracy on three long NER datasets and is capable of extracting entities from long texts in a GPU-memory-friendly manner.
Researcher Affiliation	Academia	Yelin Chen1* , Fanjin Zhang2* , Jie Tang2 1School of Computer Science and Technology, Xinjiang University, Urumqi 830049, China 2Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology using prose and mathematical equations but does not contain a dedicated section or figure labeled 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	Code https://github.com/THUDM/scholarprofiling/tree/main/sener
Open Datasets	Yes	We conduct experiments on three NER datasets: Scholar XL (Zhang et al. 2024), Sci REX (Jain et al. 2020), and Profiling-07 (Tang, Zhang, and Yao 2007; Tang et al. 2008).
Dataset Splits	No	Hyper-parameters are selected based on the F1 score on the validation set.
Hardware Specification	Yes	All experiments are conducted on an 8-card 80G Nvidia A100 server.
Software Dependencies	No	We choose De BERTa-V3-large (He, Gao, and Chen 2023) as the PLM for span-based methods and Diffusion NER. We use Adam W (Loshchilov, Hutter et al. 2017) optimizer with a weight decay of 1e 2.
Experiment Setup	Yes	We use Adam W (Loshchilov, Hutter et al. 2017) optimizer with a weight decay of 1e 2. The unilateral window sizes of the arrow attention and Bi SPA mechanism are both set to 128. We only use low-rank adaptation on the Q and V matrix of the self-attention mechanism with a rank of 8.