Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

LabelCoRank: Revolutionizing Long Tail Multi-Label Classification with Co-Occurrence Reranking

Authors: Yan Yan, Junyuan Liu, Bo-Wen Zhang

JAIR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental evaluations on popular datasets including MAG-CS, Pub Med, and AAPD demonstrate the effectiveness and robustness of Label Co Rank.
Researcher Affiliation	Academia	YAN YAN, China University of Mining & Technology, Beijing, China JUNYUAN LIU, China University of Mining & Technology, Beijing, China BO-WEN ZHANG , University of Science and Technology Beijing, China
Pseudocode	No	The paper describes the methodology using text and mathematical equations, and provides a block diagram in Fig. 1. However, it does not include explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	Yes	The implementation code is publicly available on https://github.com/821code/Label Co Rank.
Open Datasets	Yes	The proposed method is evaluated on three well-known public datasets: MAG-CS, Pub Med, and AAPD. ... MAG-CS[32]: The dataset consists of 705,407 papers from the Microsoft Academic Graph (MAG), ... Pub Med[21]: The dataset comprises 898,546 papers sourced from Pub Med, ... AAPD[41]: The dataset comprises English abstracts of computer science papers sourced from arxiv.org, ...
Dataset Splits	Yes	Table 1. Dataset statistics.𝑁trn and 𝑁tst represent the number of documents in the training set and test set, respectively. ... MAG-CS 564340 70534 ... Pub Med 718837 89855 ... AAPD 54840 1000
Hardware Specification	Yes	Experiments were conducted on a computer equipped with an Nvidia RTX 4090 GPU and 128 GB of RAM.
Software Dependencies	No	The paper mentions "The Adam W optimizer was employed" and "utilized the Ro BERTa pre-trained model as a feature extractor," but it does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used to implement the methodology.
Experiment Setup	Yes	The Adam W optimizer was employed with a learning rate of 1e-5, sentence truncation set to 512, and batch size of 16. The threshold hyperparameter, 𝛼, was set to 0.3, and the hyperparameters for the loss function weight, 𝛽, was set to 0.3, 0.3, and 0.25 for the MAG-CS, Pub Med, and AAPD datasets, respectively. The number of selected labels for these datasets 𝛾was 30, 35, and 20, respectively.