Extending Kernel PCA through Dualization: Sparsity, Robustness and Fast Algorithms

Authors: Francesco Tonin, Alex Lambert, Panagiotis Patrinos, Johan Suykens

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The proposed method is evaluated on synthetic and realworld benchmarks, showing significant speedup in KPCA training time as well as highlighting the benefits in terms of robustness and sparsity.
Researcher Affiliation Academia 1ESAT-STADIUS, KU Leuven, Belgium.
Pseudocode Yes Algorithm 1 DCA for Moreau envelope objectives
Open Source Code Yes The code is available at https://github.com/ taralloc/dc-kpca.
Open Datasets Yes For real-world data, we consider datasets from LIBSVM (Chang & Lin, 2011), UCI (Dua & Graff, 2017), and common deep learning benchmarks: Iris (n = 150, d = 4), the bioinformatics dataset Protein (n = 14895, d = 357), the text categorization dataset RCV1 (n = 20242, d = 47236), and the outputs of the second to last layer of a Res Net18 (He et al., 2016) trained on the computer vision dataset CIFAR-10 (n = 60000, d = 512).
Dataset Splits No The paper mentions a '20% test split' but does not explicitly provide details for a separate validation split or the training split percentages/counts.
Hardware Specification Yes Experiments are implemented in Python 3.10 on a machine with a 3.7GHz Intel i7-8700K processor and 64GB RAM.
Software Dependencies No The paper mentions 'Python 3.10' but does not specify version numbers for other key libraries, solvers, or frameworks used in the implementation.
Experiment Setup Yes Regarding optimization for KPCA with the square loss, we employ the LBFGS algorithm with backtracking linesearch using the strong Wolfe conditions with initialization from the standard normal distribution. For the Moreau envelopes, i.e. Huber and ϵ-insensitive losses, we employ the DC algorithm; for faster convergence second order optimization algorithms for this problem with composite smooth + non-smooth structure can also be employed, e.g. (Stella et al., 2017). For the square loss, the accuracy of a dual iterate Hk from our solver at each iteration k is chosen as the relative difference η = |d(Hk) dopt|/dopt between the dual cost d(Hk) = 1/2 Tr Hk Hk Tr q Hk GHk at iteration k and the optimal dual cost dopt = 1/2 Ps i=1 λi, with λi being the i-th largest eigenvalue of G. In fact, the optimal primal cost is the highest variance in s components, i.e., 1/2 Ps i=1 λi, and strong duality holds in our dualization. For all used solvers, we use the same stopping criterion based on achieving a target tolerance. For the other convoluted losses, we stop the DCA when the absolute variation of the loss is less than machine precision with at most 1000 iterations.