Boosting Dilated Convolutional Networks with Mixed Tensor Decompositions
Authors: Nadav Cohen, Ronen Tamari, Amnon Shashua
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation demonstrates how the expressive efficiency of connectivity, similarly to that of depth, translates into gains in accuracy. An experiment on TIMIT speech corpus (Garofolo et al. (1993)) evaluates the dilated convolutional network architectures covered by our analysis. |
| Researcher Affiliation | Collaboration | Nadav Cohen Institute for Advanced Study cohennadav@ias.edu Ronen Tamari The Hebrew University of Jerusalem ronent@cs.huji.ac.il Amnon Shashua The Hebrew University of Jerusalem shashua@cs.huji.ac.il |
| Pseudocode | No | The paper provides formal mathematical definitions and equations (e.g., eq. 4) but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states the framework used for experiments ('The framework chosen for running the experiment was Caffe toolbox (Jia et al. (2014))'), but does not provide any statement or link for the open-sourcing of their own methodology's code. |
| Open Datasets | Yes | We trained a baseline dilated convolutional network N... to classify individual phonemes in the TIMIT acoustic speech corpus (Garofolo et al. (1993)). |
| Dataset Splits | Yes | We split the data into train and validation sets in accordance with Halberstadt (1998) |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like 'Caffe toolbox (Jia et al. (2014))' and 'Adam optimizer (Kingma and Ba (2014))' but does not provide specific version numbers for these components. |
| Experiment Setup | Yes | In accordance with Wave Net, the baseline dilated convolutional network had Re LU activation (g(a, b)= max{a+b, 0} see sec. 3.1), 32 channels per layer, and input vectors of dimension 256 holding one-hot quantizations of the audio signal. The number of layers L was set to 12, corresponding to an input window of N=2L=4096 samples, spanning 250ms of audio signal standard practice with TIMIT dataset. The framework chosen for running the experiment was Caffe toolbox (Jia et al. (2014)), and we used Adam optimizer (Kingma and Ba (2014)) for training (with default hyper-parameters: moment decay rates β1 = 0.9, β2 = 0.999; learning rate α = 0.001). Weight decay and batch size were set to 10 5 and 128 respectively. Models were trained for 35000 iterations, with learning rate decreased by a factor of 10 after 80% of iterations took place. |