What if Neural Networks had SVDs?
Authors: Alexander Mathiasen, Frederik Hvilshøj, Jakob Rødsgaard Jørgensen, Anshul Nasery, Davide Mottin
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section contains two experiments. Section 4.1 compares the running time of Fast H against alternatives. Section 4.2 shows that Fast H speeds up matrix operations. To simulate a realistic machine learning environment, we performed all experiments on a standard machine learning server using a single NVIDIA RTX 2080 Ti. Figure 1: Time consumption of matrix inversion in Neural Networks. The plot compares Fast H against the sequential algorithm from [17] (see Section 4). |
| Researcher Affiliation | Academia | Aarhus University, {alexander.mathiasen, fhvilshoj, mrjakobdk}@gmail.com, davide@cs.au.dk Indian Institute of Technology, Bombay, anshulnasery@gmail.com |
| Pseudocode | Yes | Algorithm 1 Fast H Forward. Algorithm 2 Fast H Backward. |
| Open Source Code | Yes | Code www.github.com/Alexander Math/fasth. Code: github.com/Alexander Math/fasth/. |
| Open Datasets | No | The paper describes experiments on weight matrices of varying sizes (d x d) and mini-batch sizes (m=32), but does not specify the use of any named, publicly available datasets like MNIST, CIFAR-10, or ImageNet, nor does it provide access information for any specific dataset. |
| Dataset Splits | No | The paper focuses on the running time of matrix operations within Neural Networks rather than evaluating model performance on specific datasets with train/validation/test splits. Therefore, it does not specify any validation dataset splits. |
| Hardware Specification | Yes | To simulate a realistic machine learning environment, we performed all experiments on a standard machine learning server using a single NVIDIA RTX 2080 Ti. |
| Software Dependencies | No | The paper mentions implementing Fast H in PyTorch and CUDA, and using open-source implementations for comparison ('exp RNN' and 'spectral-RNN'), but it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | We measure the time of a gradient descent step with a weight matrix W Rd d and a mini-batch X Rd m, where m = 32 and d = 1 64, 2 64, ..., 48 64. We ran each algorithm 100 times, and we report mean time µ with error bars [µ σ, µ + σ] where σ is the standard deviation of running time over the 100 repetitions. |