Deep Complex Networks

Authors: Chiheb Trabelsi, Olexa Bilaniuk, Ying Zhang, Dmitriy Serdyuk, Sandeep Subramanian, Joao Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, Christopher J Pal

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test deep complex models on several computer vision tasks, on music transcription using the Music Net dataset and on Speech Spectrum Prediction using the TIMIT dataset. We achieve state-of-the-art performance on these audio-related tasks.
Researcher Affiliation Collaboration Montreal Institute for Learning Algorithms (MILA), Montreal Ecole Polytechnique, Montreal Microsoft Research, Montreal Element AI, Montreal CIFAR Senior Fellow
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The source code is located at http://github.com/Chiheb Trabelsi/deep_complex_networks
Open Datasets Yes We test deep complex models on several computer vision tasks, on music transcription using the Music Net dataset and on Speech Spectrum Prediction using the TIMIT dataset. We perform a sanity check of our deep complex network and demonstrate its effectiveness on standard image classification benchmarks, specifically, CIFAR-10, CIFAR-100. We also use a reduced-training set of SVHN that we call SVHN*.
Dataset Splits Yes For computational reasons, we use the required 73,257 training images of Street View House Numbers (SVHN). We still test on all 26,032 images. Following Thickstun et al. (2016) we used recordings with ids 2303 , 2382 , 1819 as the test subset and additionally we created a validation subset using recording ids 2131 , 2384 , 1792 , 2514 , 2567 , 1876 (randomly chosen from the training set). The validation subset was used for model selection and early stopping. The remaining 321 files were used for training. We use a training set with 3690 utterances, a validation set with 400 utterances and a standard test set with 192 utterance.
Hardware Specification Yes We would also like to acknowledge NVIDIA for donating a DGX-1 computer used in this work.
Software Dependencies No We are grateful to the developers of Theano (Theano Development Team, 2016) and Keras (Chollet et al., 2015).
Experiment Setup Yes All models (real and complex) were trained using the backpropagation algorithm with Stochastic Gradient Descent with Nesterov momentum (Nesterov, 1983) set at 0.9. We also clip the norm of our gradients to 1. We tweaked the learning rate schedule used in He et al. (2016)... We start our learning rate at 0.01 for the first 10 epochs to warm up the training and then set it at 0.1 from epoch 10-100 and then anneal the learning rates by a factor of 10 at epochs 120 and 150. We end the training at epoch 200.