Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SCOPE: Scalable Composite Optimization for Learning on Spark

Authors: Shen-Yi Zhao, Ru Xiang, Ying-Hao Shi, Peng Gao, Wu-Jun Li

AAAI 2017 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results on real datasets show that SCOPE can outperform other state-of-the-art distributed learning methods on Spark, including both batch learning methods and DSO methods.
Researcher Affiliation Academia Shen-Yi Zhao, Ru Xiang, Ying-Hao Shi, Peng Gao, Wu-Jun Li National Key Laboratory for Novel Software Technology Department of Computer Science and Technology, Nanjing University, China EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Task of Master in SCOPE
Open Source Code Yes The code can be downloaded from https://github.com/LIBBLE/LIBBLESpark/.
Open Datasets Yes We use four datasets for evaluation. They are MNIST-8M, epsilon, KDD12 and Data-A. The first two datasets can be downloaded from the Lib SVM website3. KDD12 is the dataset of Track 1 for KDD Cup 2012, which can be downloaded from the KDD Cup website4.
Dataset Splits No The paper uses 'validation' in the context of Algorithm 2's local parameter updates (uk,m+1 = uk,m η( fik,m(uk,m) fik,m(wt)+ z + c(uk,m wt));), but does not specify a separate dataset split for validation purposes to reproduce the experiment.
Hardware Specification Yes We have a Spark cluster of 33 machines (nodes) connected by 10GB Ethernet. Each machine has 12 Intel Xeon E5-2620 cores with 64GB memory.
Software Dependencies Yes We use Spark1.5.2 for our experiment, and implement our SCOPE in Scala.
Experiment Setup Yes The regularization hyper-parameter λ is set to 10-4 for the first three datasets which are relatively small, and is set to 10-6 for the largest dataset Data-A. ... For all datasets, we set c = λ 10-2. ... We set a small step-size η = 10-5 and a large M = 4000.