Koopman-based generalization bound: New aspect for full-rank weights

Authors: Yuka Hashimoto, Sho Sonoda, Isao Ishikawa, Atsushi Nitanda, Taiji Suzuki

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We investigate our bound numerically, we consider a regression problem on R3, where the target function t is t(x) = e 2x 1 2. We constructed a simple network f(x) = g(W2σ(W1x+b1)+b2), where W1 R3 3, W2 R6 3, b1 R3, b2 R6, g(x) = e x 2, and σ is a smooth version of Leaky Re LU proposed by Biswas et al. (2022). We created a training dataset from samples randomly drawn from the standard normal distribution. Figure 1 (a) illustrates the relationship between the generalization error and our bound O(QL j=1 Wj sj/(det(W j Wj)1/4)). Here, we set sj = (dj +0.1)/2. In Figure 1 (a), we can see that our bound gets smaller in proportion to the generalization error. In addition, we investigated the generalization property of a network with a regularization based on our bound. We considered the classification task with MNIST. For training the network, we used only n = 1000 samples to create a situation where the model is hard to generalize. We constructed a network with four dense layers and trained it with and without a regularization term Wj +1/ det(I +W j Wj), which makes both the norm and determinant of Wj small. Figure 1 (b) shows the test accuracy. We can see that the regularization based on our bound leads to better generalization property, which implies the validity of our bound.
Researcher Affiliation Collaboration Yuka Hashimoto1,2, Sho Sonoda2, Isao Ishikawa3,2, Atsushi Nitanda4, Taiji Suzuki5,2 1 NTT, 2 RIKEN AIP, 3 Ehime University, 4 A*STAR CFAR, 5 The University of Tokyo
Pseudocode No The paper does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps for a method formatted like code.
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets Yes We considered the classification task with MNIST. For training the network, we used only n = 1000 samples to create a situation where the model is hard to generalize. We considered the classification task with CIFAR-10 and Alex Net (Krizhevsky et al., 2012).
Dataset Splits No The paper mentions using 'training dataset' and evaluating 'test accuracy/loss', but it does not specify any training/validation/test dataset splits (e.g., percentages or counts for a validation set).
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies Yes All the experiments were executed with Python 3.9 and Tensor Flow 2.6.
Experiment Setup Yes The weight matrices are initialized by Kaiming Initialization (He et al., 2015), and we used the SGD for the optimizer. In addition, we set the error function as lθ(x, y) = |fθ(x) y|2, and added the regularization term 0.01(Q2 j=1 det(W j Wj) 1/2 + 10 Q2 j=1 Wj ).