Gradient-based Sampling: An Adaptive Importance Sampling for Least-squares

Authors: Rong Zhu

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Theoretically, we establish an error bound analysis of the general importance sampling with respect to LS solution from full data. The result establishes an improved performance of the use of our gradientbased sampling. Synthetic and real data sets are used to empirically argue that the gradient-based sampling has an obvious advantage over existing sampling methods from two aspects of statistical efficiency and computational saving.
Researcher Affiliation Academia Rong Zhu Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China. rongzhu@amss.cas.cn
Pseudocode Yes Algorithm 1 Gradient-based sampling Algorithm
Open Source Code No The paper does not provide a statement about the release of source code or a link to a code repository.
Open Datasets Yes Detailed numerical experiments are conducted to compare the excess risk of β based on L2 loss against the expected subsample size r for different synthetic datasets and real data examples. In this section, we report several representative studies. [...] on two UCI datasets: CASP (n = 45730, d = 9) and Online News Popularity (NEWS) (n = 39644, d = 59).
Dataset Splits No The paper calculates MSE based on subsample estimates for approximating the full sample LS solution, and considers sampling ratios. However, it does not specify explicit training, validation, and test splits in the traditional machine learning sense for model development.
Hardware Specification Yes We perform the computation by R software in PC with 3 GHz intel i7 processor, 8 GB memory and OS X operation system.
Software Dependencies No The paper mentions 'R software' but does not specify its version number or any other software dependencies with their versions.
Experiment Setup Yes We calculate the full sample LS solution ˆβn for each dataset, and repeatedly apply various sampling methods for B = 1000 times to get subsample estimates βb for b = 1, . . . , B. We set d as 100, and n as among 20K, 50K, 100K, 200K, 500K. Two sampling ratio r/n values are considered: 0.01 and 0.05. For GRAD, we set the r0 = r to getting the pilot estimate β0.