Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Reputation-aware Revenue Allocation for Auction-based Federated Learning

Authors: Xiaoli Tang, Han Yu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on widely used benchmark datasets, ARAS-AFL demonstrates superior performance compared to state-of-the-art approaches. It outperforms the best baseline by 49.06%, 98.69%, 10.32%, and 4.77% in terms of total revenue, number of data owners, public reputation and accuracy of federated learning models, respectively.
Researcher Affiliation	Academia	College of Computing and Data Science, Nanyang Technological University, Singapore EMAIL
Pseudocode	No	The paper describes the proposed method and its components through mathematical formulations and descriptive text, but it does not include a distinct, structured pseudocode block or algorithm section.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code, nor does it provide any links to a code repository.
Open Datasets	Yes	Our experiments are based on six commonly adopted datasets in FL: MNIST1, CIFAR-102, Fashion MNIST (FMNIST) (Xiao, Rasul, and Vollgraf 2017), EMNIST-digits (EMNISTD) / letters (EMNISTL) (Cohen et al. 2017), Kuzushiji-MNIST (KMNIST) (Clanuwat et al. 2018).
Dataset Splits	No	The paper describes experimental scenarios such as 'FL training task overrelease market' and 'FL training task underrelease market' with details on the number of data owners and tasks, but it does not specify how the mentioned datasets (MNIST, CIFAR-10, etc.) were split into training, validation, or test sets for model training and evaluation.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using 'traditional FL training algorithm like FedAvg (Mc Mahan et al. 2017)' but does not specify any software libraries or frameworks with their version numbers that were used for implementation.
Experiment Setup	Yes	We set the confidence degree γ in Equation (3) to 0.5 for each DO and the weighting factor ρ in Equation (11) to 0.1. Additionally, we set 0.5 > αi(t) >= 0.3 to ensure that the basic cost of DOs and the basic operating cost of the AFL marketplace were covered.