Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Correlation-Guided Representation for Multi-Label Text Classification

Authors: Qian-Wen Zhang, Ximing Zhang, Zhao Yan, Ruifang Liu, Yunbo Cao, Min-Ling Zhang

IJCAI 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments over benchmark multi-label datasets clearly validate the effectiveness of the proposed approach, and further analysis demonstrates that it is competitive in both predicting lowfrequency labels and convergence speed. In this section, the datasets, comparing algorithms, evaluation metrics and parameter settings are introduced. We use two datasets for MLTC: AAPD [Yang et al., 2018] and RCV1-V2 [Lewis et al., 2004]. Table 1 summarizes the detailed characteristics of the two datasets. Each dataset is divided into a training set, a validation set, and a test set, which are used as basic divisions in the performance experiments of each algorithm [Yang et al., 2018]. We report the detailed experimental results of all comparing algorithms on two datasets in Table 2.
Researcher Affiliation	Collaboration	Qian-Wen Zhang1 , Ximing Zhang2 , Zhao Yan1 , Ruifang Liu2 , Yunbo Cao1 and Min-Ling Zhang3,4 1Tencent Cloud Xiaowei, Beijing 100080, China 2Beijing University of Posts and Telecommunications, Beijing 100876, China 3School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 4Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education, China EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes its method using formulas and descriptive text but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	We use two datasets for MLTC: AAPD [Yang et al., 2018] and RCV1-V2 [Lewis et al., 2004]. Table 1 summarizes the detailed characteristics of the two datasets. Each dataset is divided into a training set, a validation set, and a test set, which are used as basic divisions in the performance experiments of each algorithm [Yang et al., 2018].
Dataset Splits	Yes	Each dataset is divided into a training set, a validation set, and a test set, which are used as basic divisions in the performance experiments of each algorithm [Yang et al., 2018].
Hardware Specification	Yes	We implement our experiments in Tensorﬂow on NVIDIA Tesla P40.
Software Dependencies	No	The paper mentions "Tensorﬂow" and "base-uncased versions of BERT" but does not provide specific version numbers for these software dependencies, which are required for reproducibility.
Experiment Setup	Yes	The batch size is 32, the learning rate is 5e 5, and the window size of additional layer is 10. Based on WCard(S) and L(S) in Table 1, the maximum total input sequence length is 320. In addition, learning rate decay is added to the BERT training part, which starts with a large learning rate and then decays multiple times [Clark et al., 2019]. Note that all BERT-based models in this paper use learning rate decay technique to improve performance.