On Data-Dependent Random Features for Improved Generalization in Supervised Learning

Authors: Shahin Shahrampour, Ahmad Beirami, Vahid Tarokh

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results on several benchmark datasets further verify that our method requires smaller number of random features to achieve a certain generalization error compared to the state-of-the-art while introducing negligible pre-processing overhead.
Researcher Affiliation Academia Shahin Shahrampour, Ahmad Beirami, Vahid Tarokh School of Engineering and Applied Sciences, Harvard University Cambridge, MA, 02138 USA
Pseudocode Yes Algorithm 1 Energy-based Exploration of Random Features (EERF) Input: {(xn, yn)}N n=1, the feature map φ( , ), integers M0 and M where M M0, initial sampling distribution P0. 1: Draw samples { ωm}M0 m=1 independently from P0. 2: Evaluate the samples in S( ), the empirical score in (7). 3: Sort | S( )| for all M0 samples in descending order, and let {ωm}M m=1 be the top M arguments, i.e., the ones that give the top M values in the sorted array. Output: {ωm}M m=1.
Open Source Code No The paper states 'All codes are written in MATLAB' but does not provide any link or explicit statement about making the source code for their method publicly available.
Open Datasets Yes We apply our proposed method to several datasets from the UCI Machine Learning Repository. ... Table 2: The description of the datasets used for Gaussian kernel: the number of features, training samples, and test samples are denoted by d, Ntrain, and Ntest, respectively. Dataset Task d Ntrain Ntest Buzz prediction on Twitter Regression 77 93800 46200 Online news popularity Regression 58 26561 13083 Adult Classification 122 32561 16281 MNIST Classification 784 60000 10000
Dataset Splits Yes If training and test sets are provided explicitly, we use them accordingly; otherwise, we split the dataset randomly. ... We then tune the regularization parameter by trying different values from {10 5, 10 4, . . . , 105}. ... Table 2: The description of the datasets used for Gaussian kernel: the number of features, training samples, and test samples are denoted by d, Ntrain, and Ntest, respectively.
Hardware Specification No The paper states: 'All codes are written in MATLAB and run on a machine with CPU 2.9 GHz and 16 GB memory.' This provides general information about CPU speed and RAM but does not specify a CPU model or any GPU details.
Software Dependencies No The paper mentions 'All codes are written in MATLAB' but does not specify a version number for MATLAB or any other software dependencies with version information.
Experiment Setup Yes For all methods (including ours) in the Gaussian case, we sample random features from N(0, Id). The value of σ for each dataset is chosen to be the mean distance of the 50th ℓ2 nearest neighbor... We then tune the regularization parameter by trying different values from {10 5, 10 4, . . . , 105}. ... Table 3: N0 is the number of samples we use for pre-processing, and M0 is the number of random features we initially generate. M is the number of random features used by both algorithms for eventual prediction.