reproducibilityindex.ai

The streaming rollout of deep networks - towards fully model-parallel execution

Authors: Volker Fischer, Jan Koehler, Thomas Pfeil

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this study, we present a theoretical framework to describe rollouts, the level of model-parallelization they induce, and demonstrate differences in solving speciﬁc tasks. We prove that certain rollouts, also for networks with only skip and no recurrent connections, enable earlier and more frequent responses, and show empirically that these early responses have better performance. The streaming rollout maximizes these properties and enables a fully parallel execution of the network reducing runtime on massively parallel devices. Finally, we provide an open-source toolbox to design, train, evaluate, and interact with streaming rollouts. In Sec. 4, we show experimental results that emphasize the difference of rollouts for both, networks with recurrent and skip, and only skip connections. To demonstrate the signiﬁcance of the chosen rollouts w.r.t. the runtime for inference and achieved accuracy, we compare the two extreme rollouts: the most model-parallel, i.e., streaming rollout (R 1, results in red in Fig. 3), and the most sequential rollout2 (R(e) = 0 for maximal number of edges, results in blue in Fig. 3). For all experiments and rollout patterns under consideration, we conduct inference on shallow rollouts (W = 1) and initialize the zero-th frame of the next rollout window with the last (i.e., 1.) frame of the preceding rollout window (see discussion Sec. 5). Datasets: Rollout patterns are evaluated on three datasets: MNIST [50], CIFAR10 [51], and the German trafﬁc sign recognition benchmark (GTSRB) [52]. Results: Rollouts are compared on the basis of their test accuracies over the duration (measured in update steps) needed to achieve these accuracies (Fig. 3a-c, e, and g).
Researcher Affiliation	Industry	Volker Fischer Bosch Center for Artiﬁcial Intelligence Renningen, Germany volker.fischer@de.bosch.com Jan Köhler Bosch Center for Artiﬁcial Intelligence Renningen, Germany jan.koehler@de.bosch.com Thomas Pfeil Bosch Center for Artiﬁcial Intelligence Renningen, Germany thomas.pfeil@de.bosch.com
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Finally, we provide an open-source toolbox to design, train, evaluate, and interact with streaming rollouts. We provide an open-source toolbox speciﬁcally designed to study streaming rollouts of deep neural networks. Both are available as open-source code3. (Footnote 3: https://github.com/boschresearch/statestream)
Open Datasets	Yes	Datasets: Rollout patterns are evaluated on three datasets: MNIST [50], CIFAR10 [51], and the German trafﬁc sign recognition benchmark (GTSRB) [52].
Dataset Splits	No	The paper mentions evaluating on datasets and test accuracies, but it does not specify explicit training/validation/test dataset splits with percentages, sample counts, or citations to predefined splits.
Hardware Specification	No	The paper discusses hardware generally and mentions potential future hardware like the "True North chip [56, 57]" in the context of massively parallel execution. However, it does not specify the particular hardware (e.g., GPU/CPU models, memory details) used to run the experiments reported in the paper.
Software Dependencies	No	For the experiments presented here, we use the Keras toolbox to compare different rollout patterns. Additionally, we implemented an experimental toolbox (Tensorﬂow and Theano backends) to study (deﬁne, train, evaluate, and visualize) networks using the streaming rollout pattern (see Sec. A3). While software names are mentioned, specific version numbers for Keras, TensorFlow, or Theano are not provided.
Experiment Setup	Yes	Details about data, preprocessing, network architectures, and the training process are given in Sec. A2. All experiments were implemented in Keras [61] with either TensorFlow [60] or Theano [59] backend. Networks were trained with RMSprop [58] with a learning rate of 10−4. To train the SR and S networks (Fig. A2), we used a batch size of 128 and trained for 100 epochs, with 10 epochs of early stopping on the validation set accuracy. For the DSR networks (Fig. A3), we used a batch size of 256 and 200 epochs, with 20 epochs of early stopping.