Capacity Control of ReLU Neural Networks by Basis-Path Norm
Authors: Shuxin Zheng, Qi Meng, Huishuai Zhang, Wei Chen, Nenghai Yu, Tie-Yan Liu5925-5932
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on benchmark datasets demonstrate that the proposed regularization method achieves clearly better performance on the test set than the previous regularization approaches. In this section, we study the relationship between this bound and the empirical generalization gap the absolute difference between test error and training error with real-data experiments |
| Researcher Affiliation | Collaboration | 1University of Science and Technology of China 2Microsoft Research Asia |
| Pseudocode | Yes | Algorithm 1 Optimize Re LU Network with SGD and Basispath Regularization |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-sourcing of its code for the described methodology. |
| Open Datasets | Yes | We conduct experiments with multi-layer perceptrons (MLP) with Re LU of different depths, widths, and global minima on MNIST classification task... We first apply our basis-path regularization method to recommendation task with MLP networks and conduct experimental studies based on a public dataset, Movie Lens. In this section, we apply our basis-path regularization to this task and conduct experimental studies based on CIFAR10 (Krizhevsky and Hinton 2009) |
| Dataset Splits | Yes | The training set consists of 10000 randomly selected samples with true labels and another at most 5000 intentionally mislabeled data which are gradually added into the training set. The evaluation of error rate is conducted on a fixed 10000 validation set. |
| Hardware Specification | No | The paper does not provide specific hardware specifications (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions various software components and optimizers (e.g., Adam optimizer, SGD, ResNet, Plain Net) but does not provide specific version numbers for any of these dependencies. |
| Experiment Setup | Yes | More details of the training strategies can be found in the appendices. We test the predictive factors of [8,16,32,64], and set the number of hidden units to the embedding size 4 in each hidden layer. For each method, we perform a wide range grid search of hyper-parameter λ from 10 α where α 5, 6, 7, 8, 9 and report the experimental results based on the best performance on the validation set. We train 34 layers Res Net and Plain Net networks on this dataset, and use SGD with widely used l2 weight decay regularization (WD) as our baseline. More training details can be found in supplementary materials. |