Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Incentivizing Time-Aware Fairness in Data Sharing

Authors: Jiangwei Chen, Kieu Thao Nguyen Pham, Rachael Sim, Arun Verma, Zhaoxuan Wu, Chuan Sheng Foo, Bryan Kian Hsiang Low

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We further illustrate how to generate model rewards that realize the reward values and empirically demonstrate the properties of our methods on synthetic and real-world datasets. [...] This section empirically illustrates the properties of our proposed reward schemes using (a) the synthetic Friedman dataset with 6 input features [11], (b) the Californian housing (Cali H) dataset [44] with 8 input features, and (c) the MNIST dataset [7] of handwritten digit images (28 28 pixels).
Researcher Affiliation	Academia	1Department of Computer Science, National University of Singapore, Singapore 2Institute for Infocomm Research, Agency for Science, Technology and Research, Singapore 3Singapore-MIT Alliance for Research and Technology, Singapore EMAIL EMAIL EMAIL EMAIL
Pseudocode	No	The paper describes methods and theoretical foundations but does not include any explicitly labeled pseudocode or algorithm blocks. Procedures are explained in narrative text and mathematical formulations.
Open Source Code	Yes	All datasets used in the paper are publicly accessible, we have also uploaded the code and instructions needed to reproduce the results in the supplementary materials. Once the blind review period is over, we will open-source our code and instructions.
Open Datasets	Yes	This section empirically illustrates the properties of our proposed reward schemes using (a) the synthetic Friedman dataset with 6 input features [11], (b) the Californian housing (Cali H) dataset [44] with 8 input features, and (c) the MNIST dataset [7] of handwritten digit images (28 28 pixels). We empirically verify our results on an additional diabetes progression (Dia P) dataset [9] with conditional IG as the valuation function. We also demonstrate our results on the CIFAR-100 dataset [23] with n = 10 parties, and the dual of validation accuracy is used as the valuation function.
Dataset Splits	Yes	For the Friedman, Cali H and Dia P datasets, we use an 80 20 train-test split to obtain Dtrain and Dtest, all parties data are randomly sampled without replacement from Dtrain. [...] Friedman Dataset (n1 = n2 > n3) We consider a test set with 200 points and parties 1, 2 and 3 having n1 = 300, n2 = 300 and n3 = 200 training points, respectively. [...] Cali H Dataset (n1>n2>n3) We consider a test set with 4128 points and parties 1, 2 and 3 having n1 = 600, n2 = 400 and n3 = 200 training points, respectively. [...] Dia P Dataset (n1 = n2 < n3) We use an 80-20 train-test split to obtain a test set with 88 points and parties 1, 2 and 3 having n1 = 75, n2 = 75 and n3 = 125 training points, respectively. [...] MNIST Dataset Since NNs are highly effective on the MNIST dataset, we first create a subsampled version by randomly selecting 20000 data points from the original training set, which serves as the aggregated data Dtrain for all parties. [...] CIFAR-100 Dataset We use all 50,000 training images as the aggregated dataset for all parties.
Hardware Specification	Yes	All experiments were performed on a system equipped with an NVIDIA A16 GPU with 10 GB of VRAM.
Software Dependencies	Yes	The system was configured with NVIDIA driver version 515.43.04 and CUDA version 11.7.
Experiment Setup	Yes	We employ the Gaussian process (GP) regression [59] model for Friedman and Cali H datasets and neural network (NN) for the MNIST dataset. In (a) and (b), each party s data value is measured by conditional IG. In (c), each party s data value is the dual7 of the validation accuracy. [...] For all GP models, we use automatic relevance determination such that each input feature has a different lengthscale parameter. [...] MNIST Dataset ... We train a NN with one hidden layer of 16 neurons using ReLU activation functions and stochastic gradient descent. The model with the highest validation accuracy over 10 training epochs is selected. [...] CIFAR-100 Dataset ... We train a NN with two hidden layers containing 1024 and 512 neurons each. We use ReLU as the activation function, and the NN is trained using dropout and the Adam optimizer. The model with the highest validation accuracy over 20 training epochs is selected.