Robust Gradient-Based Markov Subsampling

Authors: Tieliang Gong, Quanhan Xi, Chen Xu4004-4011

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To assess the performance of GMS, we conduct experiments on both simulation studies and real data examples. All numerical studies are conducted in software R on Compute Canada clusters with 2.1 GHz CPUs and 128 GB memory. In simulation studies, we generate the data by y = Xβ +ε, where the n d design matrix X is generated by a mixture of Gaussian distributions. Due to space limitation, we only show the results for the setting n = 1M, d = 500. Other results are given in the supplementary material. Figs. 3 and 4 record the boxplots based on 50 times empirical estimation error. The mean and standard deviation of EE are reported in Tables 1 and 2.
Researcher Affiliation Academia Tieliang Gong, Quanhan Xi, Chen Xu Deparment of Mathematics and Statistics, University of Ottawa, Ottawa, ON, K1N6N5, Canada
Pseudocode Yes Algorithm 1 Robust Gradient-based Markov Subsampling
Open Source Code No The paper does not provide any links or explicit statements about the availability of its source code.
Open Datasets Yes Online News Popularity (n = 39797, d = 61), Wave Energy Converters (n = 288000, d = 32) and Poker Hands (n = 25010, d = 11) 1. Footnote 1 refers to https://archive.ics.uci.edu/ml/datasets.php
Dataset Splits No The paper describes a subsampling strategy for estimation but does not provide explicit training, validation, and test dataset splits in the conventional sense (e.g., percentages or counts for each split).
Hardware Specification Yes All numerical studies are conducted in software R on Compute Canada clusters with 2.1 GHz CPUs and 128 GB memory.
Software Dependencies No The paper mentions 'software R' but does not specify a version number or any specific R packages with version numbers.
Experiment Setup Yes In all experiments, the subsample size is set by nsub = sr n, where sr represents the sampling ratio. We set sr = 0.001, 0.005, 0.01 for each model. If required, a pilot estimator is calculated by uniform subsampling of size n0 = nsub.