reproducibilityindex.ai

Block Low-Rank Preconditioner with Shared Basis for Stochastic Optimization

Authors: Jui-Nan Yen, Sai Surya Duvvuri, Inderjit Dhillon, Cho-Jui Hsieh

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results on a deep autoencoder and a Transformer benchmark demonstrate that the proposed method outperforms ﬁrstorder methods with slightly more time and memory usage, while also achieving competitive or superior performance compared to other second-order methods with less time and memory usage.
Researcher Affiliation	Collaboration	Jui-Nan Yen UCLA juinanyen@cs.ucla.edu Sai Surya Duvvuri UT Austin saisurya@cs.utexas.edu Inderjit S. Dhillon Google and UT Austin inderjit@cs.utexas.edu Cho-Jui Hsieh Google and UCLA chohsieh@cs.ucla.edu
Pseudocode	Yes	Algorithm 1 Shared-Basis Low Rank Block-Diagonal Adagrad
Open Source Code	No	The paper does not include an unambiguous statement that the authors are releasing the code for the work described in this paper, nor does it provide a direct link to a source-code repository for their methodology.
Open Datasets	Yes	We evaluate the performance using a standard Autoencoder benchmark [27] on the MNIST dataset [8] and a larger Transformer model [30] on the Universal Dependencies dataset [24].
Dataset Splits	No	The paper refers to 'validation performance' and 'validation error' and uses benchmarks, but does not explicitly provide the specific training/validation/test dataset splits (e.g., percentages or sample counts) used for reproduction.
Hardware Specification	Yes	For the autoencoder benchmark, we conduct 180 trials of random search on one NVIDIA RTX 2080Ti GPU with 11GB memory. For the Transformer benchmark, we conduct 60 trials of random search on one NVIDIA RTX A6000 GPU with 48GB memory.
Software Dependencies	No	The paper mentions using the Google Flax repository but does not provide specific version numbers for Flax or any other key software components, libraries, or solvers.
Experiment Setup	Yes	We adopt k = 32 as the default rank for our methods. For randomized SVD, we set the oversampling parameter to 0 and the number of iterations to 1. Similar to Shampoo, we use the grafting technique [2] in our method. We set the grafting type to RMSPROP_NORMALIZED. The batch size is 1000. A linear warmup of 5 epochs is used for learning rate scheduling followed by a linear decay to 0.