What if Neural Networks had SVDs?

Authors: Alexander Mathiasen, Frederik Hvilshøj, Jakob Rødsgaard Jørgensen, Anshul Nasery, Davide Mottin

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section contains two experiments. Section 4.1 compares the running time of Fast H against alternatives. Section 4.2 shows that Fast H speeds up matrix operations. To simulate a realistic machine learning environment, we performed all experiments on a standard machine learning server using a single NVIDIA RTX 2080 Ti. Figure 1: Time consumption of matrix inversion in Neural Networks. The plot compares Fast H against the sequential algorithm from [17] (see Section 4).
Researcher Affiliation Academia Aarhus University, {alexander.mathiasen, fhvilshoj, mrjakobdk}@gmail.com, davide@cs.au.dk Indian Institute of Technology, Bombay, anshulnasery@gmail.com
Pseudocode Yes Algorithm 1 Fast H Forward. Algorithm 2 Fast H Backward.
Open Source Code Yes Code www.github.com/Alexander Math/fasth. Code: github.com/Alexander Math/fasth/.
Open Datasets No The paper describes experiments on weight matrices of varying sizes (d x d) and mini-batch sizes (m=32), but does not specify the use of any named, publicly available datasets like MNIST, CIFAR-10, or ImageNet, nor does it provide access information for any specific dataset.
Dataset Splits No The paper focuses on the running time of matrix operations within Neural Networks rather than evaluating model performance on specific datasets with train/validation/test splits. Therefore, it does not specify any validation dataset splits.
Hardware Specification Yes To simulate a realistic machine learning environment, we performed all experiments on a standard machine learning server using a single NVIDIA RTX 2080 Ti.
Software Dependencies No The paper mentions implementing Fast H in PyTorch and CUDA, and using open-source implementations for comparison ('exp RNN' and 'spectral-RNN'), but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes We measure the time of a gradient descent step with a weight matrix W Rd d and a mini-batch X Rd m, where m = 32 and d = 1 64, 2 64, ..., 48 64. We ran each algorithm 100 times, and we report mean time µ with error bars [µ σ, µ + σ] where σ is the standard deviation of running time over the 100 repetitions.