Orthogonal Random Features
Authors: Felix Xinnan X. Yu, Ananda Theertha Suresh, Krzysztof M. Choromanski, Daniel N. Holtmann-Rice, Sanjiv Kumar
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on several datasets verify the effectiveness of ORF and SORF over the existing methods. |
| Researcher Affiliation | Industry | Google Research, New York {felixyu, theertha, kchoro, dhr, sanjivk}@google.com |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement or link regarding the release of open-source code for the described methodology. |
| Open Datasets | Yes | We first show kernel approximation performance on six datasets. The input feature dimension d is set to be power of 2 by padding zeros or subsampling. Figure 4 compares the mean squared error (MSE) of all methods. For fixed D, the kernel approximation MSE exhibits the following ordering: SORF ' ORF < QMC [25] < RFF [19] < Other fast kernel approximations [13, 28]. We also apply ORF and SORF on classification tasks. Table 2 shows classification accuracy for different kernel approximation techniques with a (linear) SVM classifier. Datasets mentioned include LETTER, FOREST, USPS, CIFAR, MNIST, GISETTE. |
| Dataset Splits | No | The paper mentions using datasets for classification tasks but does not provide specific details on training, validation, or test set splits, nor does it specify cross-validation methods. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks) required to replicate the experiments. |
| Experiment Setup | Yes | For each dataset, σ is chosen to be the mean distance of the 50th 2 nearest neighbor for 1,000 sampled datapoints. Empirically, this yields good classification results. The role of σ: Note that a very small σ will lead to overfitting, and a very large σ provides no discriminative power for classification. Throughout the experiments, σ for each dataset is chosen to be the mean distance of the 50th 2 nearest neighbor, which empirically yields good classification results [28]. |