Koopman-based generalization bound: New aspect for full-rank weights
Authors: Yuka Hashimoto, Sho Sonoda, Isao Ishikawa, Atsushi Nitanda, Taiji Suzuki
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We investigate our bound numerically, we consider a regression problem on R3, where the target function t is t(x) = e 2x 1 2. We constructed a simple network f(x) = g(W2σ(W1x+b1)+b2), where W1 R3 3, W2 R6 3, b1 R3, b2 R6, g(x) = e x 2, and σ is a smooth version of Leaky Re LU proposed by Biswas et al. (2022). We created a training dataset from samples randomly drawn from the standard normal distribution. Figure 1 (a) illustrates the relationship between the generalization error and our bound O(QL j=1 Wj sj/(det(W j Wj)1/4)). Here, we set sj = (dj +0.1)/2. In Figure 1 (a), we can see that our bound gets smaller in proportion to the generalization error. In addition, we investigated the generalization property of a network with a regularization based on our bound. We considered the classification task with MNIST. For training the network, we used only n = 1000 samples to create a situation where the model is hard to generalize. We constructed a network with four dense layers and trained it with and without a regularization term Wj +1/ det(I +W j Wj), which makes both the norm and determinant of Wj small. Figure 1 (b) shows the test accuracy. We can see that the regularization based on our bound leads to better generalization property, which implies the validity of our bound. |
| Researcher Affiliation | Collaboration | Yuka Hashimoto1,2, Sho Sonoda2, Isao Ishikawa3,2, Atsushi Nitanda4, Taiji Suzuki5,2 1 NTT, 2 RIKEN AIP, 3 Ehime University, 4 A*STAR CFAR, 5 The University of Tokyo |
| Pseudocode | No | The paper does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps for a method formatted like code. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | We considered the classification task with MNIST. For training the network, we used only n = 1000 samples to create a situation where the model is hard to generalize. We considered the classification task with CIFAR-10 and Alex Net (Krizhevsky et al., 2012). |
| Dataset Splits | No | The paper mentions using 'training dataset' and evaluating 'test accuracy/loss', but it does not specify any training/validation/test dataset splits (e.g., percentages or counts for a validation set). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | Yes | All the experiments were executed with Python 3.9 and Tensor Flow 2.6. |
| Experiment Setup | Yes | The weight matrices are initialized by Kaiming Initialization (He et al., 2015), and we used the SGD for the optimizer. In addition, we set the error function as lθ(x, y) = |fθ(x) y|2, and added the regularization term 0.01(Q2 j=1 det(W j Wj) 1/2 + 10 Q2 j=1 Wj ). |