Scalable Kernel Methods via Doubly Stochastic Gradients

Authors: Bo Dai, Bo Xie, Niao He, Yingyu Liang, Anant Raj, Maria-Florina F Balcan, Le Song

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that a function learned by this procedure after t iterations converges to the optimal function in the reproducing kernel Hilbert space in rate O(1/t), and achieves a generalization bound of O(1/ t). Our approach can readily scale kernel methods up to the regimes which are dominated by neural nets. We show competitive performances of our approach as compared to neural nets in datasets such as 2.3 million energy materials from Molecular Space, 8 million handwritten digits from MNIST, and 1 million photos from Image Net using convolution features.
Researcher Affiliation Academia 1Georgia Institute of Technology {bodai, bxie33, nhe6, araj34}@gatech.edu, lsong@cc.gatech.edu 2Princeton University 3Carnegie Mellon University yingyul@cs.princeton.edu ninamf@cs.cmu.edu
Pseudocode Yes Algorithm 1: {αi}t i=1 = Train(P(x, y)) Require: P(ω), φω(x), l(f(x), y), ν. 1: for i = 1, . . . , t do 2: Sample (xi, yi) P(x, y). 3: Sample ωi P(ω) with seed i. 4: f(xi) = Predict(xi, {αj}i 1 j=1). 5: αi = γil (f(xi), yi)φωi(xi). 6: αj = (1 γiν)αj for j = 1, . . . , i 1. 7: end for
Open Source Code No No explicit statement indicating the release of source code (e.g., 'We release our code...', 'The source code for our method is available at...') or a link to a code repository was found in the paper.
Open Datasets Yes Name Model # of samples Input dim Output range Virtual (1) Adult K-SVM 32K 123 { 1, 1} no (2) MNIST 8M 8 vs. 6 [25] K-SVM 1.6M 784 { 1, 1} yes (3) Forest K-SVM 0.5M 54 { 1, 1} no (4) MNIST 8M [25] K-logistic 8M 1568 {0, . . . , 9} yes (5) CIFAR 10 [26] K-logistic 60K 2304 {0, . . . , 9} yes (6) Image Net [27] K-logistic 1.3M 9216 {0, . . . , 999} yes (7) Quantum Machine [28] K-ridge 6K 276 [ 800, 2000] yes (8) Molecular Space [28] K-ridge 2.3M 2850 [0, 13] no
Dataset Splits No No explicit statement regarding training/validation/test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) was found.
Hardware Specification Yes The algorithms are executed on the machine with AMD 16 2.4GHz Opteron CPUs and 200G memory.
Software Dependencies No No specific software dependencies with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9') were mentioned in the paper.
Experiment Setup Yes For algorithms based on low rank kernel matrix approximation and random features, i.e., pegasos and SDCA, we set the rank and number of random features to be 28. We use same batch size for both our algorithm and the competitors. We stop algorithms when they pass through the entire dataset once.