reproducibilityindex.ai

Improving Neural Network Training in Low Dimensional Random Bases

Authors: Frithjof Gressmann, Zach Eaton-Rosen, Carlo Luschi

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here, we revisit optimization in low-dimensional random subspaces with the aim of improving its practical optimization performance. We show that while random subspace projections have computational beneﬁts such as easy distribution on many workers, they become less efﬁcient with growing projection dimensionality, or if the subspace projection is ﬁxed throughout training. We observe that applying smaller independent random projections to different parts of the network and re-drawing them at every step signiﬁcantly improves the obtained accuracy on fully-connected and several convolutional architectures, including Res Nets on the MNIST, Fashion-MNIST and CIFAR-10 datasets. Table 1 reports the validation accuracy after 100 epochs (plots in Supplementary Material, Figure B.6). All methods other than SGD use a dimensionality reduction factor of 400 .
Researcher Affiliation	Industry	Frithjof Gressmann Graphcore Research Bristol, UK frithjof@graphcore.ai Zach Eaton-Rosen Graphcore Research London, UK zacher@graphcore.ai Carlo Luschi Graphcore Research Bristol, UK carlo@graphcore.ai
Pseudocode	Yes	Algorithm 1: Training procedures for a single worker (left) and for parallelized workers (right).
Open Source Code	Yes	Our source code is available at https://github.com/graphcore-research/random-bases
Open Datasets	Yes	fully-connected and several convolutional architectures, including Res Nets on the MNIST, Fashion-MNIST and CIFAR-10 datasets.
Dataset Splits	No	The paper mentions 'validation accuracy' for standard datasets (MNIST, FMNIST, CIFAR-10) but does not explicitly state the training, validation, or test split percentages or sample counts for these datasets, nor does it provide a citation for specific predefined splits used.
Hardware Specification	Yes	To meet the algorithmic demand for fast pseudo-random number generation (PRNG), we conduct these experiments using Graphcore s ﬁrst generation Intelligence Processing Unit (IPU)2. The Colossus MK1 IPU (GC2) accelerator is composed of 1216 independent cores with in-core PRNG hardware units that can generate up to 944 billion random samples per second [22]. On a single IPU, random bases descent training of the CIFAR-10 CNN with the extremely sample intensive dimension d = 10k achieved a throughput of 31 images per second (100 epochs / 1.88 days), whereas training the same model on an 80 core CPU machine achieved 2.6 images/second (100 epochs / 22.5 days). To rule out the possibility that the measured speedup can be attributed to the forward-backward acceleration only, we also measured the throughput of our implementation on a GPU V100 accelerator but found no signiﬁcant throughput improvement relative to the CPU baseline.
Software Dependencies	No	The paper mentions 'TensorFlow implementation' but does not specify a version number for TensorFlow or any other software libraries or dependencies used.
Experiment Setup	Yes	All networks use Re LU nonlinearities and are trained with a softmax cross-entropy loss on the image classiﬁcation tasks MNIST, Fashion-MNIST (FMNIST), and CIFAR-10. Unless otherwise noted, basis vectors are drawn from a normal distribution and normalized. We do not explicitly orthogonalize, but rely on the quasi-orthogonality of random directions in high dimensions [13]. Further details can be found in the Supplementary Material. ... Table 1 reports the validation accuracy after 100 epochs ... We train the CNN on CIFAR-10 for 2000 epochs (2.5 million steps)... Input: Learning rate ηRBD, network initialization θt=0