reproducibilityindex.ai

DeepPCR: Parallelizing Sequential Operations in Neural Networks

Authors: Federico Danieli, Miguel Sarabia, Xavier Suau Cuadros, Pau Rodriguez, Luca Zappella

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To verify the theoretical lower complexity of the algorithm, and to identify regimes for speedup, we test the effectiveness of Deep PCR in parallelizing the forward and backward pass in multi-layer perceptrons, and reach speedups of up to 30 for the forward, and 200 for the backward pass.
Researcher Affiliation	Industry	Apple {f_danieli, miguelsdc, xsuaucuadros, pau.rodriguez, lzappella}@apple.com
Pseudocode	Yes	Pseudo-code for the adapted algorithm is reported in Alg. 1, and a schematic of how the reduction is performed is outlined in Fig. 2.
Open Source Code	No	The paper does not contain any explicit statement about releasing the source code for the described methodology or a link to a code repository.
Open Datasets	Yes	We train a deep Res Net model composed of only fully-connected layers... trained on a classification task on MNIST [8]... acceleration of training of deep Res Net [15] on MNIST [8], and generation in Diffusion Models trained on MNIST, CIFAR-10 [25] and Celeb A [29].
Dataset Splits	No	The paper mentions training and testing, and uses the term 'validation' in section 4.3 (referring to 'reproducing the validation'), but does not provide specific details about train/validation/test dataset splits, percentages, or methodology for creating such splits.
Hardware Specification	Yes	All the experiments in this section were conducted on a V100 GPU with 40GB of RAM; our models are built using the Py Torch framework, without any form of neural network compilation. We run the same experiments in Sec. 4.1 on two different Graphical Processing Units (GPUs): an NVIDIA Tesla V100, with 40GB of available memory, and an NVIDIA Tesla A100, with 80GB. Trained all models on a single A100 GPU with 80G of v RAM.
Software Dependencies	No	Our models are built using the Py Torch framework, without any form of neural network compilation. No specific version number for PyTorch or other software dependencies is provided.
Experiment Setup	Yes	We train for 8 epochs using an SGD optimizer with a learning rate of 10-3 without a scheduler. The models are trained with Adam W with a batch size of 4096 and a constant learning rate of 10-4 for 400 epochs.