Variance Reduction With Sparse Gradients

Authors: Melih Elibol, Lihua Lei, Michael I. Jordan

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, our algorithm consistently outperforms Spider Boost using various models on various tasks including image classification, natural language processing, and sparse matrix factorization.
Researcher Affiliation Academia Melih Elibol, Michael I. Jordan University of California, Berkeley {elibol,jordan}@cs.berkeley.edu Lihua Lei Stanford University lihualei@stanford.edu
Pseudocode Yes Algorithm 1: Spider Boost with Sparse Gradients.
Open Source Code No The paper does not contain an explicit statement about releasing open-source code or provide a link to a code repository.
Open Datasets Yes For datasets, we use CIFAR-10 (Krizhevsky et al.), SVHN (Netzer et al., 2011), and MNIST (Le Cun & Cortes, 2010).
Dataset Splits No The paper does not explicitly provide details about training/validation/test dataset splits, such as specific percentages or sample counts.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models.
Software Dependencies No The paper mentions software like Tensorflow and Pytorch but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes For all experiments, unless otherwise specified, we run Spider Boost and Sparse Spider Boost with a learning rate η = 0.1, large-batch size B = 1000, small-batch size b = 100, inner loop length of m = 10, memory decay factor of α = 0.5, and k1 and k2 both set to 5% of the total number of model parameters.