Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Entropy-Calibrated Label Distribution Learning

Authors: Yunan Lu, Bowen Xue, Xiuyi Jia, Lei Yang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we conduct extensive experiments on various real-world datasets to demonstrate the effectiveness of our proposal.
Researcher Affiliation	Academia	Yunan Lu Department of Computing The Hong Kong Polytechnic University Hong Kong, China EMAIL Bowen Xue Department of Computing The Hong Kong Polytechnic University Hong Kong, China EMAIL Xiuyi Jia School of Computer Science and Engineering Nanjing University of Science and Technology Nanjing, China EMAIL Lei Yang Department of Computing The Hong Kong Polytechnic University Hong Kong, China EMAIL
Pseudocode	No	The paper describes mathematical formulations and propositions but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	In terms of the code, the implementation of our proposed IAR is to simple to provide additional code files to describe it. Nevertheless we will still make our code public after the paper is published.
Open Datasets	Yes	Datasets. To ensure broad coverage of data complexity and practical scenarios, we select datasets including Jaffe ( H : 0.96 0.03) [22], BU-3DFE ( H : 0.95 0.04) [34], Movie ( H : 0.88 0.06) [1], Music Mood ( H : 0.94 0.03) [13], Natural Scene ( H : 0.47 0.27) [1], Emotion6 ( H : 0.64 0.16) [25], Art Painting ( H : 0.72 0.13) [23], and M2B ( H : 0.41 0.12) [24]
Dataset Splits	Yes	Experimental Procedure. Given a dataset with label distributions, we first randomly divide the dataset into two subsets (30% is used as the test set and 70% is used as the training set). Further, we train an LDL model on the training set and apply the model to predict the label distribution of the test samples. Then, we evaluate the performance of the LDL model by comparing the ground-truth and the predicted label distributions. Finally, we repeat the above process ten times under randomly different dataset partitions and statistically summarize the results of the ten random experiments.
Hardware Specification	No	Computer resources have a negligible effect on both the experimental results and the main claims of this paper.
Software Dependencies	No	The paper mentions employing L-BFGS to minimize the loss function but does not specify any software versions for libraries, frameworks, or programming languages.
Experiment Setup	Yes	All hyperparameters for these comparison algorithms are tuned within the ranges recommended by their respective publications. For our proposed method, the hyperparameter α is optimized within the range of {1, 10, 20, . . . , 100}. We employ L-BFGS to minimize the loss function of our method. Furthermore, to ensure fair comparison, we set the trade-off parameters of the L2 regularization in comparison algorithms as 0, consistent with the implementation of all comparison algorithms. ... For the ablation study, their hyperparameter λ is selected from {10-3, 10-2, . . . , 103}.