reproducibilityindex.ai

What if Neural Networks had SVDs?

Authors: Alexander Mathiasen, Frederik Hvilshøj, Jakob Rødsgaard Jørgensen, Anshul Nasery, Davide Mottin

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section contains two experiments. Section 4.1 compares the running time of Fast H against alternatives. Section 4.2 shows that Fast H speeds up matrix operations. To simulate a realistic machine learning environment, we performed all experiments on a standard machine learning server using a single NVIDIA RTX 2080 Ti. Figure 1: Time consumption of matrix inversion in Neural Networks. The plot compares Fast H against the sequential algorithm from [17] (see Section 4).
Researcher Affiliation	Academia	Aarhus University, {alexander.mathiasen, fhvilshoj, mrjakobdk}@gmail.com, davide@cs.au.dk Indian Institute of Technology, Bombay, anshulnasery@gmail.com
Pseudocode	Yes	Algorithm 1 Fast H Forward. Algorithm 2 Fast H Backward.
Open Source Code	Yes	Code www.github.com/Alexander Math/fasth. Code: github.com/Alexander Math/fasth/.
Open Datasets	No	The paper describes experiments on weight matrices of varying sizes (d x d) and mini-batch sizes (m=32), but does not specify the use of any named, publicly available datasets like MNIST, CIFAR-10, or ImageNet, nor does it provide access information for any specific dataset.
Dataset Splits	No	The paper focuses on the running time of matrix operations within Neural Networks rather than evaluating model performance on specific datasets with train/validation/test splits. Therefore, it does not specify any validation dataset splits.
Hardware Specification	Yes	To simulate a realistic machine learning environment, we performed all experiments on a standard machine learning server using a single NVIDIA RTX 2080 Ti.
Software Dependencies	No	The paper mentions implementing Fast H in PyTorch and CUDA, and using open-source implementations for comparison ('exp RNN' and 'spectral-RNN'), but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	We measure the time of a gradient descent step with a weight matrix W Rd d and a mini-batch X Rd m, where m = 32 and d = 1 64, 2 64, ..., 48 64. We ran each algorithm 100 times, and we report mean time µ with error bars [µ σ, µ + σ] where σ is the standard deviation of running time over the 100 repetitions.