On Data-Dependent Random Features for Improved Generalization in Supervised Learning
Authors: Shahin Shahrampour, Ahmad Beirami, Vahid Tarokh
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results on several benchmark datasets further verify that our method requires smaller number of random features to achieve a certain generalization error compared to the state-of-the-art while introducing negligible pre-processing overhead. |
| Researcher Affiliation | Academia | Shahin Shahrampour, Ahmad Beirami, Vahid Tarokh School of Engineering and Applied Sciences, Harvard University Cambridge, MA, 02138 USA |
| Pseudocode | Yes | Algorithm 1 Energy-based Exploration of Random Features (EERF) Input: {(xn, yn)}N n=1, the feature map φ( , ), integers M0 and M where M M0, initial sampling distribution P0. 1: Draw samples { ωm}M0 m=1 independently from P0. 2: Evaluate the samples in S( ), the empirical score in (7). 3: Sort | S( )| for all M0 samples in descending order, and let {ωm}M m=1 be the top M arguments, i.e., the ones that give the top M values in the sorted array. Output: {ωm}M m=1. |
| Open Source Code | No | The paper states 'All codes are written in MATLAB' but does not provide any link or explicit statement about making the source code for their method publicly available. |
| Open Datasets | Yes | We apply our proposed method to several datasets from the UCI Machine Learning Repository. ... Table 2: The description of the datasets used for Gaussian kernel: the number of features, training samples, and test samples are denoted by d, Ntrain, and Ntest, respectively. Dataset Task d Ntrain Ntest Buzz prediction on Twitter Regression 77 93800 46200 Online news popularity Regression 58 26561 13083 Adult Classification 122 32561 16281 MNIST Classification 784 60000 10000 |
| Dataset Splits | Yes | If training and test sets are provided explicitly, we use them accordingly; otherwise, we split the dataset randomly. ... We then tune the regularization parameter by trying different values from {10 5, 10 4, . . . , 105}. ... Table 2: The description of the datasets used for Gaussian kernel: the number of features, training samples, and test samples are denoted by d, Ntrain, and Ntest, respectively. |
| Hardware Specification | No | The paper states: 'All codes are written in MATLAB and run on a machine with CPU 2.9 GHz and 16 GB memory.' This provides general information about CPU speed and RAM but does not specify a CPU model or any GPU details. |
| Software Dependencies | No | The paper mentions 'All codes are written in MATLAB' but does not specify a version number for MATLAB or any other software dependencies with version information. |
| Experiment Setup | Yes | For all methods (including ours) in the Gaussian case, we sample random features from N(0, Id). The value of σ for each dataset is chosen to be the mean distance of the 50th ℓ2 nearest neighbor... We then tune the regularization parameter by trying different values from {10 5, 10 4, . . . , 105}. ... Table 3: N0 is the number of samples we use for pre-processing, and M0 is the number of random features we initially generate. M is the number of random features used by both algorithms for eventual prediction. |