reproducibilityindex.ai

Fast and Accurate Stochastic Gradient Estimation

Authors: Beidi Chen, Yingchen Xu, Anshumali Shrivastava

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our proposal with experiments on linear models as well as the non-linear BERT, which is a recent popular deep learning based language representation model. and 3 Experiments Linear regression is a basic and commonly used supervised machine learning algorithm for prediction. Deep learning models recently become popular for their state-of-the-art performance on Natural Language Processing (NLP) and also Computer Vision tasks. Therefore, we chose both linear regression and deep learning models as the target experiment tasks to examine the effectiveness of our algorithm.
Researcher Affiliation	Academia	Beidi Chen Rice University Houston, Texas beidi.chen@rice.edu Yingchen Xu Rice University Houston, Texas yx26@rice.edu Anshumali Shrivastava Rice University Houston, Texas anshumali@rice.edu
Pseudocode	Yes	Algorithm 1: assignment algorithm and Algorithm 2: LSH-Sampled Stochastic gradient Descent (LGD) Algorithm
Open Source Code	No	The paper does not provide an explicit statement about releasing the source code for their proposed method or a link to a code repository.
Open Datasets	Yes	Dataset: We used three large regression, Year Prediction MSD [18],Slice [18], UJIIndoor Loc [27], and two NLP benchmarks, MRPC [13], RTE [28].
Dataset Splits	No	The paper mentions training data and testing data for the datasets used (Figure 4) and parameters like '3 epochs with batch size 32' but does not specify explicit validation set sizes or percentages for dataset splits.
Hardware Specification	No	We do not explore the time-wise convergence comparison between LGD and SGD in current tasks because BERT is implemented in Tensorﬂow [1] and Pytorch [21] on GPU. We currently only have the CPU implementation of LSH. - This mentions “GPU” and “CPU” but no specific models or details.
Software Dependencies	No	We do not explore the time-wise convergence comparison between LGD and SGD in current tasks because BERT is implemented in Tensorﬂow [1] and Pytorch [21] on GPU. - Software is mentioned but without specific version numbers.
Experiment Setup	Yes	For each task, we ran ﬁne-tunings for 3 epochs with batch size 32 and used Adam optimizer with initial learning rates 2e. As for LSH parameter, we chose K = 7, L = 10. and We used ﬁxed values K = 5 and L = 100 for all the datasets. l is the number of hash tables that have been searched before landing in a non-empty bucket in a query. In our experiments l is almost always as low as 1. L only affects preprocessing but not sampling. Our hash function was simhash (or signed random projections) and we used sparse random projections with sparsity 1/30 for speed. We tried a sweep of initial step size from 1e-5 to 1e-1 and choose the one that will lead to convergence with LGD and SGD.