Random Feature Maps for the Itemset Kernel

Authors: Kyohei Atarashi, Subhransu Maji, Satoshi Oyama3199-3206

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we demonstrate the effectiveness of using the proposed maps for real-world datasets. We evaluate the effectiveness of the feature maps on several datasets. We first evaluated the accuracy of our proposed RK feature map on the Movielens 100K dataset (Harper and Konstan 2016)... As shown in Table 1, the RK feature map with the Rademacher distribution had the lowest absolute error and variance... We next evaluated the effectiveness of the SCRK feature map... As shown in Figure 1, when the dimension of the original feature vector d was large, the SCRK feature map was more efficient. We next evaluated the performance of linear models using our proposed RK/SCRK feature maps for the Movielens 100K dataset... As shown in the Figure 2, when the number of random features D = 1, 248 = 16d, the accuracies of the linear SVMs using the proposed RK feature map were as good as those of the non-linear SVMs, FMs, and all-subsets model. We also evaluated the performance of the linear models using the RK/SCRK feature maps and the existing models for the phishing and IJCNN datasets (Mohammad, Thabtah, and Mc Cluskey 2012; Prokhorov 2001).
Researcher Affiliation Academia Kyohei Atarashi Hokkaido University atarashi k@complex.ist.hokudai.ac.jp Subhransu Maji University of Massachusetts, Amherst smaji@umass.cs.edu Satoshi Oyama Hokkaido University/RIKEN AIP oyama@ist.hokudai.ac.jp
Pseudocode Yes Algorithm 1 Random Kernel Feature Map Input: x Rd, S 2[d] 1: Generate D Rademacher vectors ω1, . . . , ωD { 1, +1}d 2: Compute D itemset kernels KS(x, ωs) for all s [D] Output: Z(x) = 1 D KS(x, ω1), . . . , KS(x, ωD)
Open Source Code No The paper mentions using open-source libraries like Scipy, scikit-learn, and polylearn, but it does not state that the authors are releasing their own code for the proposed methods described in the paper.
Open Datasets Yes We first evaluated the accuracy of our proposed RK feature map on the Movielens 100K dataset (Harper and Konstan 2016)... We also evaluated the performance of the linear models using the RK/SCRK feature maps and the existing models for the phishing and IJCNN datasets (Mohammad, Thabtah, and Mc Cluskey 2012; Prokhorov 2001).
Dataset Splits Yes We converted the recommender system problem to a binary classification problem. We binarized the original ratings (from 1 to 5) by using 5 as a threshold. There were 21, 200, 1, 000, and 20, 202 training, validation, and testing examples.
Hardware Specification No The paper discusses computational time and memory efficiency but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory size) used for the experiments.
Software Dependencies No We used Scipy (Jones, Oliphant, and Peterson 2001) implementations of FFT and IFFT (scipy.fftpack)... We used Linear SVC and SVC in scikit-learn (Pedregosa et al. 2011)... For the implementation of FMs, we used Factorization Machine Classifier in polylearn (Niculae 2016). While specific software packages are named, their version numbers are not provided, which is necessary for reproducibility.
Experiment Setup Yes The age, living area, gender, and occupation of users and the genre and release year of items were used as features... The dimension of the feature vectors was 78. We calculated the mean absolute errors for these instances for 100 trials using Rademacher, Gaussian, Uniform, and Laplace distributions... We varied the dimension of the random features: 2, 4, 8 and 16 times that of the original feature vectors... We set D = 8092 for all d. We converted the recommender system problem to a binary classification problem... We normalized each feature vector and varied the random features dimension... All the methods have a regularization hyperparameter, which we set on the basis of the validation accuracy of the non-linear SVMs. For the linear SVMs using random feature maps, we ran ten trials with a different random seed for each trial and calculated the mean of the values. We used a Rademacher distribution for the random vectors. For the FMs and all-subsets model, we also ran ten trials and calculated the mean of the values. We used coordinate descent... For the rank hyperparameter, we followed Blondel et al. (Blondel et al. 2016a) and set it to 30.