WLD-Reg: A Data-Dependent Within-Layer Diversity Regularizer
Authors: Firas Laakom, Jenni Raitoharju, Alexandros Iosifidis, Moncef Gabbouj
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present an extensive empirical study confirming that the proposed approach enhances the performance of several stateof-the-art neural network models in multiple tasks. |
| Researcher Affiliation | Academia | 1 Faculty of Information Technology and Communication Sciences, Tampere University, Finland 2 Faculty of Information Technology, University of Jyv askyl a, Finland 3 DIGIT, Department of Electrical and Computer Engineering, Aarhus University, Denmark |
| Pseudocode | Yes | Algorithm 1: One epoch of training with WLD-Reg |
| Open Source Code | Yes | The code is publically available at https://github.com/firasl/AAAI-23WLD-Reg. |
| Open Datasets | Yes | CIFAR10 and CIFAR100 (Krizhevsky, Hinton et al. 2009)., Image Net-2012 classification dataset (Russakovsky et al. 2015) |
| Dataset Splits | Yes | We split the original training set (50,000) into two sets: we use the first 40,000 images as the main training set and the last 10,000 as a validation set for hyperparameters optimization. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU, CPU models, or cloud computing instances) used for conducting the experiments. |
| Software Dependencies | No | The paper mentions optimizers like 'stochastic gradient descent (SGD)' but does not provide specific software dependencies or versions for libraries, frameworks, or programming languages used in the experiments. |
| Experiment Setup | Yes | All the models are trained using stochastic gradient descent (SGD) with a momentum of 0.9, weight decay of 0.0001, and a batch size of 128 for 200 epochs. The initial learning rate is set to 0.1 and is then decreased by a factor of 5 after 60, 120, and 160 epochs, respectively. and For the hyperparameters, we fix λ1 = λ2 = 0.001 and γ = 10 for all the different approaches. |