Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Random Features for Shift-Invariant Kernels with Moment Matching

Authors: Weiwei Shen, Zhihui Yang, Jun Wang

AAAI 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive empirical studies and comparisons with several highly competitive peer methods verify the superiority of the proposed algorithm in Gram matrix approximation and generalization errors in regression. For validation, we provide detailed theoretical proofs and empirical comparisons with six state-of-the-art sampling methods across four standard benchmarks.
Researcher Affiliation	Collaboration	School of Computer Science and Software Engineering East China Normal University, Shanghai, China GE Global Research Center, Niskayuna, NY, USA
Pseudocode	Yes	Algorithm 1 Random Features with Moment Matching
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	Data: Four benchmark datasets with relatively high dimensions are examined in our experiments: (a) YP90 with 10000 and 2000 90-dimensional data points for training and testing, respectively; (b) QM274 with 6000 and 1165 274-dimensional data points for training and testing, respectively; (c) MNIST300 with 8000 and 2000 300-dimensional data points for training and testing, respectively; and (d) LR500 with 8000 and 2000 500-dimensional data points for training and testing, respectively.
Dataset Splits	Yes	Data: Four benchmark datasets with relatively high dimensions are examined in our experiments: (a) YP90 with 10000 and 2000 90-dimensional data points for training and testing, respectively; (b) QM274 with 6000 and 1165 274-dimensional data points for training and testing, respectively; (c) MNIST300 with 8000 and 2000 300-dimensional data points for training and testing, respectively; and (d) LR500 with 8000 and 2000 500-dimensional data points for training and testing, respectively.
Hardware Specification	No	The paper discusses computational costs but does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup	Yes	We specify the band width σ as the average distance of all data points to their tenth nearest neighbors unless otherwise stated. We ﬁrst implement principal component decomposition on 4096 sampled random features. Then, we build principal component regression models by all 4096 principal components and by the ﬁrst 512 principal components, respectively.