Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels

Authors: Haim Avron, Vikas Sindhwani, Jiyan Yang, Michael W. Mahoney

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theoretical analyses are complemented with empirical results that demonstrate the effectiveness of classical and adaptive QMC techniques for this problem. In this section we report experiments with both classical QMC sequences and adaptive sequences learnt from box discrepancy minimization.
Researcher Affiliation	Collaboration	Haim Avron EMAIL School of Mathematical Sciences Tel Aviv University Tel Aviv, 69978, Israel Vikas Sindhwani EMAIL Google Research New York, NY 10011, USA Jiyan Yang EMAIL Institute for Computational and Mathematical Engineering Stanford University Stanford, CA 94305, USA Michael W. Mahoney EMAIL International Computer Science Institute and Department of Statistics University of California at Berkeley Berkeley, CA 94720, USA
Pseudocode	Yes	Algorithm 1 Quasi-Random Fourier Features Input: Shift-invariant kernel k, size s. Output: Feature map ˆΨ(x) : Rd 7 Cs.
Open Source Code	No	For Halton and Sobol , we use the implementation available in MATLAB.3 For Lattice Rules and Digital Nets, we use publicly available implementations.4 This refers to third-party software, not code released by the authors for their specific methodology.
Open Datasets	Yes	We examine four data sets: cpu (6554 examples, 21 dimensions), census (a subset chosen randomly with 5,000 examples, 119 dimensions), USPST (1,506 examples, 250 dimensions after PCA) and MNIST (a subset chosen randomly with 5,000 examples, 250 dimensions after PCA). These are well-known, publicly available benchmark datasets.
Dataset Splits	Yes	For each data set, we performed 5-fold cross-validation when using random Fourier features (MC sequence) to set the bandwidth σ, and then used the same σ for all other sequences. The ridge parameter is set by the optimal value we obtain via 5-fold cross-validation on the training set by using the MC sequence.
Hardware Specification	No	The paper discusses running times of experiments but does not provide specific hardware details such as CPU/GPU models, memory, or other computer specifications.
Software Dependencies	No	The paper mentions using MATLAB for Halton and Sobol sequences, and CVX for optimization, but does not provide specific version numbers for these software components.
Experiment Setup	Yes	The ridge parameter is set by the optimal value we obtain via 5-fold cross-validation on the training set by using the MC sequence. For each data set, we performed 5-fold cross-validation when using random Fourier features (MC sequence) to set the bandwidth σ, and then used the same σ for all other sequences.