Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

PubSub-VFL: Towards Efficient Two-Party Split Learning in Heterogeneous Environments via Publisher/Subscriber Architecture

Authors: Yi Liu, Yang Liu, Leqian Zheng, Jue Hong, Junjie Shi, Qingyou Yang, Ye Wu, Cong Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive case studies on ve benchmark datasets further validate its effectiveness, showing that, compared to state-of-the-art baselines, Pub Sub-VFL not only accelerates training by 2–7 without compromising accuracy, but also achieves a computational resource utilization rate of up to 91.07%.
Researcher Affiliation	Collaboration	Yi Liu1, Yang Liu2, , Leqian Zheng1, Jue Hong2, Junjie Shi2, Qingyou Yang2, Ye Wu2, and Cong Wang1, 1Department of Computer Science, City University of Hong Kong 2Byte Dance Inc.
Pseudocode	Yes	The above dynamic programming solution (Algo. 2) and the pseudo code of Pub Sub-VFL (Algo. 1) can be found in the Appendix E. Algorithm 1 Pub Sub-VFL Training Framework. Algorithm 2 Optimal Conguration via Dynamic Programming.
Open Source Code	Yes	Question: Does the paper provide open access to the data and code, with sufcient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justication: We provide the complete code and data in the Supplementary Materials.
Open Datasets	Yes	We evaluate Pub Sub-VFL on four public benchmark datasets (see Table 6 in Appendix F) spanning both regression and classication tasks, along with a large-scale synthetic dataset. For regression, we use the Energy [43] (19,735 samples, 27 features) and Blog [44] (60,021 samples, 280 features) datasets. For classication, we adopt the Bank [45] (40,787 samples, 48 features) and Credit [46] (30,000 samples, 23 features) datasets. To assess scalability, we generate a synthetic dataset with 1 million samples and 500 features using Scikit-learn [47].
Dataset Splits	Yes	Each dataset is split into 70% training and 30% testing, with training data approximately evenly distributed between two parties.
Hardware Specification	Yes	All experiments are developed using Python 3.9 and Py Torch 1.12 and evaluated on a server with an INTEL(R) XEON(R) GOLD 6530 (64-core CPU).
Software Dependencies	Yes	All experiments are developed using Python 3.9 and Py Torch 1.12 and evaluated on a server with an INTEL(R) XEON(R) GOLD 6530 (64-core CPU).
Experiment Setup	Yes	For a series of constants, we set T0 = 5, Tddl = 10s, p = 5, q = 5. For the constants in the optimization model, we determined them through empirical experiments (see the Appendix H for details). In addition, we set the learning rate to 0.001, the number of workers to wa/wp [2, 50], the batch size B {16, 32, 64, 128, 256, 512, 1024}, and Ca + Cp = 64.