Boosting Dilated Convolutional Networks with Mixed Tensor Decompositions

Authors: Nadav Cohen, Ronen Tamari, Amnon Shashua

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluation demonstrates how the expressive efficiency of connectivity, similarly to that of depth, translates into gains in accuracy. An experiment on TIMIT speech corpus (Garofolo et al. (1993)) evaluates the dilated convolutional network architectures covered by our analysis.
Researcher Affiliation Collaboration Nadav Cohen Institute for Advanced Study cohennadav@ias.edu Ronen Tamari The Hebrew University of Jerusalem ronent@cs.huji.ac.il Amnon Shashua The Hebrew University of Jerusalem shashua@cs.huji.ac.il
Pseudocode No The paper provides formal mathematical definitions and equations (e.g., eq. 4) but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper states the framework used for experiments ('The framework chosen for running the experiment was Caffe toolbox (Jia et al. (2014))'), but does not provide any statement or link for the open-sourcing of their own methodology's code.
Open Datasets Yes We trained a baseline dilated convolutional network N... to classify individual phonemes in the TIMIT acoustic speech corpus (Garofolo et al. (1993)).
Dataset Splits Yes We split the data into train and validation sets in accordance with Halberstadt (1998)
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions software like 'Caffe toolbox (Jia et al. (2014))' and 'Adam optimizer (Kingma and Ba (2014))' but does not provide specific version numbers for these components.
Experiment Setup Yes In accordance with Wave Net, the baseline dilated convolutional network had Re LU activation (g(a, b)= max{a+b, 0} see sec. 3.1), 32 channels per layer, and input vectors of dimension 256 holding one-hot quantizations of the audio signal. The number of layers L was set to 12, corresponding to an input window of N=2L=4096 samples, spanning 250ms of audio signal standard practice with TIMIT dataset. The framework chosen for running the experiment was Caffe toolbox (Jia et al. (2014)), and we used Adam optimizer (Kingma and Ba (2014)) for training (with default hyper-parameters: moment decay rates β1 = 0.9, β2 = 0.999; learning rate α = 0.001). Weight decay and batch size were set to 10 5 and 128 respectively. Models were trained for 35000 iterations, with learning rate decreased by a factor of 10 after 80% of iterations took place.