Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Order-Level Attention Similarity Across Language Models: A Latent Commonality

Authors: Jinglin Liang, Jin Zhong, Shuangping Huang, Yunqing Hu, Huiyuan Zhang, Huifang Li, Lixin Fan, Hanlin Gu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that TOA s cross-LM generalization effectively enhances the performance of unseen LMs. Code is available at https://github.com/jinglin-liang/OLAS. We evaluated the cross-model transfer capability of TOA on four foundational NLP tasks: Relation Extraction (RE) [52], Named Entity Recognition (NER), Dependency Parsing (DP), and Part-of-Speech Tagging (POS). The results for RE and NER are presented in Tables 4 and 5, while those for DP and POS are included in L (Tables 15 and 16).
Researcher Affiliation	Collaboration	Jinglin Liang1, Jin Zhong1, Shuangping Huang1,2 , Yunqing Hu3, Huiyuan Zhang3, Huifang Li4, Lixin Fan5, Hanlin Gu5,6 1South China University of Technology, 2Pazhou Laboratory, 3Zhuzhou CRRC Times Electric Co., 4China Telecom Research Institute, 5We Bank, 6The Hong Kong University of Science and Technology, EMAIL, EMAIL
Pseudocode	No	The paper describes methodologies such as 'Order-Level Decomposition of Attention Rollout' using mathematical equations (e.g., Equation 2) and structural diagrams (e.g., Figure 1), but it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps formatted like code.
Open Source Code	Yes	Extensive experiments demonstrate that TOA s cross-LM generalization effectively enhances the performance of unseen LMs. Code is available at https://github.com/jinglin-liang/OLAS.
Open Datasets	Yes	This paper employs the following five datasets, which are introduced below. CoNLL2012 [54]. UD-English-EWT v2.15 [15]. SemEval-2010 Task 8 [55]. CoNLL2000 [56]. IMDB [57].
Dataset Splits	Yes	UD-English-EWT v2.15 [15]... Its training set contains 12,544 sentences, and the test set contains 2,077 sentences. SemEval-2010 Task 8 [55]... Its training set contains 8,000 sentences, and the test set contains 2,717 sentences. CoNLL2000 [56]... Its training set contains 8,937 sentences, and the test set contains 2,013 sentences. IMDB [57]... Both its training and test sets contain 25,000 sentences each.
Hardware Specification	Yes	All experiments were conducted on a single 40GB NVIDIA A100 GPU.
Software Dependencies	No	The paper references various language models (e.g., Qwen2-1.5B, Llama3.2-3B) and a deep learning architecture (ResNet-18 [50]), but it does not provide specific version numbers for general software dependencies like programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries used in the implementation.
Experiment Setup	Yes	For the CLMs experiments, the axial transformer consists of 5 layers, while for the MLMs experiments, it consists of 3 layers. The number of epochs is set to 15. For each experimental setup, we trained with three learning rates: 1e-4, 3e-5, and 1e-5, and report the bestperforming results.