Adaptive Knowledge Driven Regularization for Deep Neural Networks
Authors: Zhaojing Luo, Shaofeng Cai, Can Cui, Beng Chin Ooi, Yang Yang8810-8818
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations on diverse benchmark datasets and neural network structures show that CORR-Reg achieves significant improvement over stateof-the-art regularization methods. Experiments This section evaluates the effectiveness of our CORRReg with diverse benchmark datasets and neural networks. The baseline regularization methods include L1-norm regularization (L1-reg) (Williams 1995), L2-norm regularization (L2-reg) (Hastie, Tibshirani, and Friedman 2001), Maxnorm (Srebro, Rennie, and Jaakkola 2005; Lee et al. 2010; Srebro and Shraibman 2005) and Dropout (Hinton et al. 2012; Srivastava et al. 2014). |
| Researcher Affiliation | Academia | 1Department of Computer Science, School of Computing, National University of Singapore 2School of Computer Science and Engineering, University of Electronic Science and Technology of China {zhaojing, shaofeng, cuican, ooibc}@comp.nus.edu.sg, dlyyang@gmail.com |
| Pseudocode | Yes | Algorithm 1 Lazy Update for CORR-Reg |
| Open Source Code | No | The paper does not provide an explicit statement about releasing the source code for CORR-Reg, nor does it include a link to a code repository. |
| Open Datasets | Yes | MNIST Dataset 1: This is a public database of handwritten digits... 1http://yann.lecun.com/exdb/mnist/ CIFAR-10 Dataset2: This is a public benchmark image classification dataset... 2https://www.cs.toronto.edu/ kriz/cifar.html MIMIC-III Dataset (Johnson et al. 2016): This is a public benchmark dataset... Sentence Polarity Dataset (Pang and Lee 2005): This is a public benchmark dataset for sentiment analysis. |
| Dataset Splits | No | For MNIST: 'The training and test datasets contain 60000 and 10000 images respectively.' For MIMIC-III: 'This dataset consists of 16248 samples, which we divide into 12379 samples for training and 3869 samples for testing.' For CIFAR-10: 'There are 50000 training and 10000 test images.' For Sentence Polarity: 'there are 5788 training samples and 1809 test samples.' While training and test splits are provided, explicit details about a separate validation split are not given for all datasets. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU, CPU models, memory) used to conduct the experiments. |
| Software Dependencies | No | The paper mentions general software components like Python and SGD but does not provide specific version numbers for these or any other key software dependencies required for reproducibility. |
| Experiment Setup | Yes | In all experiments, the optimizer is SGD with a momentum of 0.9. The batch size is 128. We adopt training epochs 500 for MLP and LSTM, 200 for autoencoder and Le Net, and 300 for VGG. Empirically, E is set to 3 and Ts is set to 10. Figure 3 shows the performance for different combinations of λ and β values on MNIST-AE and MIMIC-III-MLP. |