DeepPCR: Parallelizing Sequential Operations in Neural Networks
Authors: Federico Danieli, Miguel Sarabia, Xavier Suau Cuadros, Pau Rodriguez, Luca Zappella
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To verify the theoretical lower complexity of the algorithm, and to identify regimes for speedup, we test the effectiveness of Deep PCR in parallelizing the forward and backward pass in multi-layer perceptrons, and reach speedups of up to 30 for the forward, and 200 for the backward pass. |
| Researcher Affiliation | Industry | Apple {f_danieli, miguelsdc, xsuaucuadros, pau.rodriguez, lzappella}@apple.com |
| Pseudocode | Yes | Pseudo-code for the adapted algorithm is reported in Alg. 1, and a schematic of how the reduction is performed is outlined in Fig. 2. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing the source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | We train a deep Res Net model composed of only fully-connected layers... trained on a classification task on MNIST [8]... acceleration of training of deep Res Net [15] on MNIST [8], and generation in Diffusion Models trained on MNIST, CIFAR-10 [25] and Celeb A [29]. |
| Dataset Splits | No | The paper mentions training and testing, and uses the term 'validation' in section 4.3 (referring to 'reproducing the validation'), but does not provide specific details about train/validation/test dataset splits, percentages, or methodology for creating such splits. |
| Hardware Specification | Yes | All the experiments in this section were conducted on a V100 GPU with 40GB of RAM; our models are built using the Py Torch framework, without any form of neural network compilation. We run the same experiments in Sec. 4.1 on two different Graphical Processing Units (GPUs): an NVIDIA Tesla V100, with 40GB of available memory, and an NVIDIA Tesla A100, with 80GB. Trained all models on a single A100 GPU with 80G of v RAM. |
| Software Dependencies | No | Our models are built using the Py Torch framework, without any form of neural network compilation. No specific version number for PyTorch or other software dependencies is provided. |
| Experiment Setup | Yes | We train for 8 epochs using an SGD optimizer with a learning rate of 10-3 without a scheduler. The models are trained with Adam W with a batch size of 4096 and a constant learning rate of 10-4 for 400 epochs. |