Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SCOPE: Scalable Composite Optimization for Learning on Spark

Authors: Shen-Yi Zhao, Ru Xiang, Ying-Hao Shi, Peng Gao, Wu-Jun Li

AAAI 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on real datasets show that SCOPE can outperform other state-of-the-art distributed learning methods on Spark, including both batch learning methods and DSO methods.
Researcher Affiliation	Academia	Shen-Yi Zhao, Ru Xiang, Ying-Hao Shi, Peng Gao, Wu-Jun Li National Key Laboratory for Novel Software Technology Department of Computer Science and Technology, Nanjing University, China EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Task of Master in SCOPE
Open Source Code	Yes	The code can be downloaded from https://github.com/LIBBLE/LIBBLESpark/.
Open Datasets	Yes	We use four datasets for evaluation. They are MNIST-8M, epsilon, KDD12 and Data-A. The ﬁrst two datasets can be downloaded from the Lib SVM website3. KDD12 is the dataset of Track 1 for KDD Cup 2012, which can be downloaded from the KDD Cup website4.
Dataset Splits	No	The paper uses 'validation' in the context of Algorithm 2's local parameter updates (uk,m+1 = uk,m η( fik,m(uk,m) fik,m(wt)+ z + c(uk,m wt));), but does not specify a separate dataset split for validation purposes to reproduce the experiment.
Hardware Specification	Yes	We have a Spark cluster of 33 machines (nodes) connected by 10GB Ethernet. Each machine has 12 Intel Xeon E5-2620 cores with 64GB memory.
Software Dependencies	Yes	We use Spark1.5.2 for our experiment, and implement our SCOPE in Scala.
Experiment Setup	Yes	The regularization hyper-parameter λ is set to 10-4 for the ﬁrst three datasets which are relatively small, and is set to 10-6 for the largest dataset Data-A. ... For all datasets, we set c = λ 10-2. ... We set a small step-size η = 10-5 and a large M = 4000.