SparkNet: Training Deep Networks in Spark

Authors: Philipp Moritz, Robert Nishihara, Ion Stoica, Michael Jordan

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We quantify the dependence of the speedup obtained by Spark Net on the number of machines, the communication frequency, and the cluster s communication overhead, and we benchmark our system s performance on the Image Net dataset.
Researcher Affiliation Academia Philipp Moritz , Robert Nishihara , Ion Stoica, Michael I. Jordan Electrical Engineering and Computer Science University of California Berkeley, CA 94720, USA {pcmoritz,rkn,istoica,jordan}@eecs.berkeley.edu
Pseudocode Yes Listing 3: Distributed training example
Open Source Code Yes The code for Spark Net is available at https://github.com/amplab/Spark Net.
Open Datasets Yes We train the default Caffe model of Alex Net (Krizhevsky et al., 2012) on the Image Net dataset (Russakovsky et al., 2015).
Dataset Splits No The paper mentions that 'The data is split among the Spark workers' but does not provide specific details on training, validation, and test splits (e.g., percentages, sample counts, or explicit standard split references).
Hardware Specification Yes To explore the scaling behavior of our algorithm and implementation, we perform experiments on EC2 using clusters of g2.8xlarge nodes. Each node has four NVIDIA GRID GPUs and 60GB memory.
Software Dependencies No The paper mentions using 'Apache Spark', 'Caffe deep learning library', 'Java Native Access', and 'Google Protocol Buffers' but does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes Each worker then runs SGD on the model with its subset of data for a fixed number of iterations τ (we use τ = 50 in Listing 3) or for a fixed length of time