Bayesian Sparsification of Deep C-valued Networks

Authors: Ivan Nazarov, Evgeny Burnaev

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To this end we extend Sparse Variational Dropout to complex-valued neural networks and verify the proposed Bayesian technique by conducting a large numerical study of the performancecompression trade-off of C-valued networks on two tasks: image recognition on MNIST-like and CIFAR10 datasets and music transcription on Music Net.
Researcher Affiliation Academia 1 Centre for Data Intensive Sciecne and Engineering, Skolkovo Insitiute of Science and Technology, Msocow, Russia .
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks clearly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code Yes The source code for a package based on Py Torch (Paszke et al., 2019), which implements C-valued Sparse Varia tional Dropout and ARD layers and provides other ba sic layers for CVNN is available at https://github. com/ivannz/cplxmodule.
Open Datasets Yes MNIST (Lecun et al., 1998), KMNIST (Clanuwat et al., 2018), EMNIST (Cohen et al., 2017) and Fashion-MNIST (Xiao et al., 2017). CIFAR10 dataset comprising 32 32 colour images of 10 classes (Krizhevsky, 2009). Music Net is a corpus of 330 annotated classical music recordings used for learning feature representations for music transcription tasks (Thickstun et al., 2017).
Dataset Splits Yes The dataset is split into the same validation and test samples and handled identically to their study.
Hardware Specification No The authors acknowl edge the use of the Skoltech CDISE HPC cluster Zhores for obtaining the results presented in this paper. This mentions an HPC cluster but does not provide specific hardware details such as GPU or CPU models.
Software Dependencies No The paper mentions 'Py Torch (Paszke et al., 2019)' as the basis for their package, but it does not provide a specific version number for PyTorch or any other software dependencies crucial for replication.
Experiment Setup Yes Networks are trained with ADAM optimizer, with the learning rate reset to 10 3 before each stage and global ℓ2-norm gradient clipping at 0.5. Stages (sec. 5.1) last for 40, 75 and 40 epochs, respectively, in each experiment. The sparsification threshold τ is fixed at 1 , the training batch size is set to 128 and the base 2 learning rate 10 3 is reduced after the 10-th epoch to 10 4 k at every stage. We vary C { 3 2 2 : k = 2, , 38} in 2 (2 ) and repeat each experiment 5 times to get a sample of compression-accuracy pairs.