reproducibilityindex.ai

Tensor Normal Training for Deep Learning Models

Authors: Yi Ren, Donald Goldfarb

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments were run on a machine with one V100 GPU and eight Xeon Gold 6248 CPUs using PyTorch [37]. Each algorithm was run using the best hyper-parameters, determined by an appropriate grid search (speciﬁed below), and 5 different random seeds. In Figures 2 and 3 the performance of each algorithm is plotted: the solid curves give results obtained by averaging the 5 different runs, and the shaded area depicts the standard deviation range for these runs.
Researcher Affiliation	Academia	Yi Ren, Donald Goldfarb Department of Industrial Engineering and Operations Research Columbia University New York, NY 10027 {yr2322, goldfarb}@columbia.edu
Pseudocode	Yes	Algorithm 1 Generic Tensor-Normal Training (TNT)
Open Source Code	Yes	Our code is available at https://github.com/renyiryry/tnt_neurips_2021.
Open Datasets	Yes	We ﬁrst compared the optimization performance of each algorithm on two autoencoder problems [21] with datasets MNIST [27] and FACES3, which were also used in [32, 34, 5, 15] as benchmarks to compare different algorithms.
Dataset Splits	No	The paper mentions "validation accuracy" for CNN models but does not provide specific details on how the training, validation, and test sets were split (e.g., percentages, counts, or methodology for creating the splits).
Hardware Specification	Yes	Our experiments were run on a machine with one V100 GPU and eight Xeon Gold 6248 CPUs using PyTorch [37].
Software Dependencies	No	The paper mentions using 'PyTorch [37]' but does not provide a specific version number for PyTorch or other software dependencies.
Experiment Setup	Yes	For each algorithm, we conducted a grid search on the learning rate and damping value based on the criteria of minimal training loss. We set the Fisher matrix update frequency T1 1 and inverse update frequency T2 20 for all of the second-order methods.