Stochastic Optimization for Kernel PCA
Authors: Lijun Zhang, Tianbao Yang, Jinfeng Yi, Rong Jin, Zhi-Hua Zhou
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we perform several experiments to examine the performance of our method. Experimental Setting We compare our stochastic algorithm for kernel PCA (SKPCA) with the following methods. 1. Baseline (Sch olkopf, Smola, and M uller 1998), which calculates the kernel matrix K explicitly and eigendecomposes it. 2. Approximation based on the Nystr om method (Drineas and Mahoney 2005; Zhang, Tsang, and Kwok 2008), which uses the Nystr om method to find a low-rank approximator of K, and eigendecomposes it. 3. Kernel Hebbian Algorithm (KHA) (Kim, Franz, and Sch olkopf 2005), which is an iterative approach for kernel PCA. The experiments are done on two benchmark data sets: Mushrooms (Chang and Lin 2011) and Magic (Frank and Asuncion 2010), which contain 8, 124 and 19, 020 examples, respectively. We choose those two medium-size data sets, because they can be handled by Baseline and thus allow us to compare different methods quantitatively. For all the experiments, we repeat them 10 times and report the averaged result. Experimental Results We first examine the convergence rate of SKPCA. We run SKPCA with four different combinations of the parameter λ and the number of random Fourier components k. In Fig. 1(a), we report the normalized recover error Zt K 2 F /n2 with respect to the number of iterations t on the Mushrooms data set. |
| Researcher Affiliation | Collaboration | Lijun Zhang1,2 and Tianbao Yang3 and Jinfeng Yi4 and Rong Jin5 and Zhi-Hua Zhou1,2 1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 2Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, China 3Department of Computer Science, the University of Iowa, Iowa City, USA 4IBM Thomas J. Watson Research Center, Yorktown Heights, USA 5Alibaba Group, Seattle, USA |
| Pseudocode | Yes | Algorithm 1 A Stochastic algorithm for Kernel PCA Input: The number of trials T, and the regularization parameter λ 1: Initialize Z1 = 0 2: for t = 1, 2, . . . , T do 3: Sample a random matrix ξt 4: ηt = 2/t 5: Zt+1 = Dηtλ [(1 ηt)Zt + ηtξt] 6: end for 7: Calculate the nonzero eigensystems of 1 2(ZT +1 + Z T +1): {(ui, σi)}k i=1 8: return {(ui, σi + λ)}k i=1 |
| Open Source Code | No | The paper does not provide an explicit statement or link to the source code for the methodology described. |
| Open Datasets | Yes | The experiments are done on two benchmark data sets: Mushrooms (Chang and Lin 2011) and Magic (Frank and Asuncion 2010), which contain 8, 124 and 19, 020 examples, respectively. |
| Dataset Splits | No | The paper mentions using specific datasets but does not specify any training, validation, or test splits, nor does it describe cross-validation. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'LIBSVM' but does not provide a specific version number. It also mentions 'Gaussian kernel' and 'random Fourier features' but these are methods/components, not specific software dependencies with versions. |
| Experiment Setup | Yes | In order to run SKPCA, we need to decide the value of the parameter λ in (3), which in turn determines the number of eigenvectors used in kernel PCA. [...] We choose the Gaussian kernel κ(xi, xj) = exp( xi xj 2/(2σ2)), and set the kernel width σ to the 20-th percentile of the pairwise distances (Mallapragada et al. 2009). The random matrix in SKPCA is constructed by random Fourier features (Rahimi and Recht 2008). Although the optimal step size of KHA in theory is 1/t, we found it led to very slow convergence, and thus set it to be 0.05 as suggested by (Kim, Franz, and Sch olkopf 2005). |