Deep learning with kernels through RKHM and the Perron-Frobenius operator
Authors: Yuka Hashimoto, Masahiro Ikeda, Hachem Kadri
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We numerically confirm our theory and the validity of the proposed deep RKHM. We compared the generalization property of the deep RKHM to the deep vv RKHS with the same positive definite kernel. For d = 10 and n = 10, we set xi = (azi)2 + ϵi as input samples, where a R100 10 and zi R10 are randomly generated by N(0, 0.1), the normal distribution of mean 0 and standard deviation 0.1, ( )2 is the elementwise product, and ϵi is the random noise drawn from N(0, 1e 3). We reshaped xi to a 10 by 10 matrix. We set L = 3 and kj = k I for j = 1, 2, 3, where k is the Laplacian kernel. For RKHMs, we set A1 = Block((1, . . . , 1), d), A2 = Block((2, . . . , 2), d), and A3 = Cd d. This is the autoencoder mentioned in Example 3.1. For vv RKHSs, we set the corresponding Hilbert spaces with the Hilbert Schmidt inner product. We set the loss function as 1/n Pn i=1 |f(xi) xi|2 A A for the deep RKHM and as 1/n Pn i=1 f(xi) xi 2 HS for the deep vv RKHS. Here, f = f3 f2 f1. We did not add any terms to the loss function to see how the loss function with the operator norm affects the generalization performance. We computed the same value E[|f(x) x|2 A] A 1/n Pn i=1 |f(xi) xi|2 A A for both RKHM and vv RKHS. Figure 2 (a) shows the results. |
| Researcher Affiliation | Collaboration | Yuka Hashimoto NTT Network Service Systems Laboratories / RIKEN AIP, Tokyo, Japan yuka.hashimoto@ntt.com Masahiro Ikeda RIKEN AIP / Keio University, Tokyo, Japan masahiro.ikeda@riken.jp Hachem Kadri Aix-Marseille University, CNRS, LIS, Marseille, France hachem.kadri@lis-lab.fr |
| Pseudocode | No | The paper does not contain any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | Comparison to CNN We compared the deep RKHM to a CNN on the classification task with MNIST [39]. We set d = 28 and n = 20. |
| Dataset Splits | No | The paper mentions training and testing but does not specify training/test/validation dataset splits or cross-validation details for reproducibility. |
| Hardware Specification | Yes | All the experiments are executed with Python 3.9 and Tensor Flow 2.6 on Intel(R) Core(TM) i9 CPU and NVIDIA Quadro RTX 5000 GPU with CUDA 11.7. |
| Software Dependencies | Yes | All the experiments are executed with Python 3.9 and Tensor Flow 2.6 on Intel(R) Core(TM) i9 CPU and NVIDIA Quadro RTX 5000 GPU with CUDA 11.7. |
| Experiment Setup | Yes | We set the loss function as 1/n Pn i=1 |f(xi) xi|2 A A for the deep RKHM and as 1/n Pn i=1 f(xi) xi 2 HS for the deep vv RKHS. ... For the optimizer, we used SGD. The learning rate is set as 1e 4 both for the deep RKHM and deep vv RKHS. The initial value of ci,j is set as ai,j + ϵi,j, where ai,j Aj is the block matrix all of whose elements are 0.1 and ϵi,j is randomly drawn from N(0, 0.05). ... The additional term to the loss function is set as λ1( (ηI + GL) 1 op + GL op) + λ2 f L 2 ML, where η = 0.01 and λ2 = 0.01 according to Subsection 5.2. ... For the optimizer, we used Adam with learning rate 1e 3 for both the deep RKHM and the CNN. The initial value of ci,j is set as ϵi,j, where ϵi,j is randomly drawn from N(0, 0.1). |