Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Robust Gradient-Based Markov Subsampling
Authors: Tieliang Gong, Quanhan Xi, Chen Xu4004-4011
AAAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To assess the performance of GMS, we conduct experiments on both simulation studies and real data examples. All numerical studies are conducted in software R on Compute Canada clusters with 2.1 GHz CPUs and 128 GB memory. In simulation studies, we generate the data by y = Xβ +ε, where the n d design matrix X is generated by a mixture of Gaussian distributions. Due to space limitation, we only show the results for the setting n = 1M, d = 500. Other results are given in the supplementary material. Figs. 3 and 4 record the boxplots based on 50 times empirical estimation error. The mean and standard deviation of EE are reported in Tables 1 and 2. |
| Researcher Affiliation | Academia | Tieliang Gong, Quanhan Xi, Chen Xu Deparment of Mathematics and Statistics, University of Ottawa, Ottawa, ON, K1N6N5, Canada |
| Pseudocode | Yes | Algorithm 1 Robust Gradient-based Markov Subsampling |
| Open Source Code | No | The paper does not provide any links or explicit statements about the availability of its source code. |
| Open Datasets | Yes | Online News Popularity (n = 39797, d = 61), Wave Energy Converters (n = 288000, d = 32) and Poker Hands (n = 25010, d = 11) 1. Footnote 1 refers to https://archive.ics.uci.edu/ml/datasets.php |
| Dataset Splits | No | The paper describes a subsampling strategy for estimation but does not provide explicit training, validation, and test dataset splits in the conventional sense (e.g., percentages or counts for each split). |
| Hardware Specification | Yes | All numerical studies are conducted in software R on Compute Canada clusters with 2.1 GHz CPUs and 128 GB memory. |
| Software Dependencies | No | The paper mentions 'software R' but does not specify a version number or any specific R packages with version numbers. |
| Experiment Setup | Yes | In all experiments, the subsample size is set by nsub = sr n, where sr represents the sampling ratio. We set sr = 0.001, 0.005, 0.01 for each model. If required, a pilot estimator is calculated by uniform subsampling of size n0 = nsub. |