Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
SCOPE: Scalable Composite Optimization for Learning on Spark
Authors: Shen-Yi Zhao, Ru Xiang, Ying-Hao Shi, Peng Gao, Wu-Jun Li
AAAI 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on real datasets show that SCOPE can outperform other state-of-the-art distributed learning methods on Spark, including both batch learning methods and DSO methods. |
| Researcher Affiliation | Academia | Shen-Yi Zhao, Ru Xiang, Ying-Hao Shi, Peng Gao, Wu-Jun Li National Key Laboratory for Novel Software Technology Department of Computer Science and Technology, Nanjing University, China EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Task of Master in SCOPE |
| Open Source Code | Yes | The code can be downloaded from https://github.com/LIBBLE/LIBBLESpark/. |
| Open Datasets | Yes | We use four datasets for evaluation. They are MNIST-8M, epsilon, KDD12 and Data-A. The first two datasets can be downloaded from the Lib SVM website3. KDD12 is the dataset of Track 1 for KDD Cup 2012, which can be downloaded from the KDD Cup website4. |
| Dataset Splits | No | The paper uses 'validation' in the context of Algorithm 2's local parameter updates (uk,m+1 = uk,m η( fik,m(uk,m) fik,m(wt)+ z + c(uk,m wt));), but does not specify a separate dataset split for validation purposes to reproduce the experiment. |
| Hardware Specification | Yes | We have a Spark cluster of 33 machines (nodes) connected by 10GB Ethernet. Each machine has 12 Intel Xeon E5-2620 cores with 64GB memory. |
| Software Dependencies | Yes | We use Spark1.5.2 for our experiment, and implement our SCOPE in Scala. |
| Experiment Setup | Yes | The regularization hyper-parameter λ is set to 10-4 for the first three datasets which are relatively small, and is set to 10-6 for the largest dataset Data-A. ... For all datasets, we set c = λ 10-2. ... We set a small step-size η = 10-5 and a large M = 4000. |