Adaptive Knowledge Driven Regularization for Deep Neural Networks

Authors: Zhaojing Luo, Shaofeng Cai, Can Cui, Beng Chin Ooi, Yang Yang8810-8818

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluations on diverse benchmark datasets and neural network structures show that CORR-Reg achieves significant improvement over stateof-the-art regularization methods. Experiments This section evaluates the effectiveness of our CORRReg with diverse benchmark datasets and neural networks. The baseline regularization methods include L1-norm regularization (L1-reg) (Williams 1995), L2-norm regularization (L2-reg) (Hastie, Tibshirani, and Friedman 2001), Maxnorm (Srebro, Rennie, and Jaakkola 2005; Lee et al. 2010; Srebro and Shraibman 2005) and Dropout (Hinton et al. 2012; Srivastava et al. 2014).
Researcher Affiliation Academia 1Department of Computer Science, School of Computing, National University of Singapore 2School of Computer Science and Engineering, University of Electronic Science and Technology of China {zhaojing, shaofeng, cuican, ooibc}@comp.nus.edu.sg, dlyyang@gmail.com
Pseudocode Yes Algorithm 1 Lazy Update for CORR-Reg
Open Source Code No The paper does not provide an explicit statement about releasing the source code for CORR-Reg, nor does it include a link to a code repository.
Open Datasets Yes MNIST Dataset 1: This is a public database of handwritten digits... 1http://yann.lecun.com/exdb/mnist/ CIFAR-10 Dataset2: This is a public benchmark image classification dataset... 2https://www.cs.toronto.edu/ kriz/cifar.html MIMIC-III Dataset (Johnson et al. 2016): This is a public benchmark dataset... Sentence Polarity Dataset (Pang and Lee 2005): This is a public benchmark dataset for sentiment analysis.
Dataset Splits No For MNIST: 'The training and test datasets contain 60000 and 10000 images respectively.' For MIMIC-III: 'This dataset consists of 16248 samples, which we divide into 12379 samples for training and 3869 samples for testing.' For CIFAR-10: 'There are 50000 training and 10000 test images.' For Sentence Polarity: 'there are 5788 training samples and 1809 test samples.' While training and test splits are provided, explicit details about a separate validation split are not given for all datasets.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU, CPU models, memory) used to conduct the experiments.
Software Dependencies No The paper mentions general software components like Python and SGD but does not provide specific version numbers for these or any other key software dependencies required for reproducibility.
Experiment Setup Yes In all experiments, the optimizer is SGD with a momentum of 0.9. The batch size is 128. We adopt training epochs 500 for MLP and LSTM, 200 for autoencoder and Le Net, and 300 for VGG. Empirically, E is set to 3 and Ts is set to 10. Figure 3 shows the performance for different combinations of λ and β values on MNIST-AE and MIMIC-III-MLP.