Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels

Authors: Haim Avron, Vikas Sindhwani, Jiyan Yang, Michael W. Mahoney

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theoretical analyses are complemented with empirical results that demonstrate the effectiveness of classical and adaptive QMC techniques for this problem. In this section we report experiments with both classical QMC sequences and adaptive sequences learnt from box discrepancy minimization.
Researcher Affiliation Collaboration Haim Avron EMAIL School of Mathematical Sciences Tel Aviv University Tel Aviv, 69978, Israel Vikas Sindhwani EMAIL Google Research New York, NY 10011, USA Jiyan Yang EMAIL Institute for Computational and Mathematical Engineering Stanford University Stanford, CA 94305, USA Michael W. Mahoney EMAIL International Computer Science Institute and Department of Statistics University of California at Berkeley Berkeley, CA 94720, USA
Pseudocode Yes Algorithm 1 Quasi-Random Fourier Features Input: Shift-invariant kernel k, size s. Output: Feature map ˆΨ(x) : Rd 7 Cs.
Open Source Code No For Halton and Sobol , we use the implementation available in MATLAB.3 For Lattice Rules and Digital Nets, we use publicly available implementations.4 This refers to third-party software, not code released by the authors for their specific methodology.
Open Datasets Yes We examine four data sets: cpu (6554 examples, 21 dimensions), census (a subset chosen randomly with 5,000 examples, 119 dimensions), USPST (1,506 examples, 250 dimensions after PCA) and MNIST (a subset chosen randomly with 5,000 examples, 250 dimensions after PCA). These are well-known, publicly available benchmark datasets.
Dataset Splits Yes For each data set, we performed 5-fold cross-validation when using random Fourier features (MC sequence) to set the bandwidth σ, and then used the same σ for all other sequences. The ridge parameter is set by the optimal value we obtain via 5-fold cross-validation on the training set by using the MC sequence.
Hardware Specification No The paper discusses running times of experiments but does not provide specific hardware details such as CPU/GPU models, memory, or other computer specifications.
Software Dependencies No The paper mentions using MATLAB for Halton and Sobol sequences, and CVX for optimization, but does not provide specific version numbers for these software components.
Experiment Setup Yes The ridge parameter is set by the optimal value we obtain via 5-fold cross-validation on the training set by using the MC sequence. For each data set, we performed 5-fold cross-validation when using random Fourier features (MC sequence) to set the bandwidth σ, and then used the same σ for all other sequences.