Quaternion Recurrent Neural Networks
Authors: Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Georges Linarès, Chiheb Trabelsi, Renato De Mori, Yoshua Bengio
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The effectiveness of the proposed QRNN and QLSTM is evaluated on the realistic TIMIT phoneme recognition task (Section 4.2) that shows that both QRNN and QLSTM obtain better performances than RNNs and LSTMs with a best observed phoneme error rate (PER) of 18.5% and 15.1% for QRNN and QLSTM, compared to 19.0% and 15.3% for RNN and LSTM. |
| Researcher Affiliation | Collaboration | 1LIA, Université d Avignon, France 2MILA, Université de Montréal, Québec, Canada 3Mc Gill University, Québec, Canada 4Orkis, Aix-en-provence, France 5Element AI, Montréal, Québec, Canada |
| Pseudocode | Yes | Algorithm 1 Quaternion-valued weight initialization |
| Open Source Code | Yes | https://github.com/Orkis-Research/Pytorch-Quaternion-Neural-Networks |
| Open Datasets | Yes | The training process is based on the standard 3, 696 sentences uttered by 462 speakers, while testing is conducted on 192 sentences uttered by 24 speakers of the TIMIT (Garofolo et al., 1993) dataset. |
| Dataset Splits | Yes | The training process is based on the standard 3, 696 sentences uttered by 462 speakers, while testing is conducted on 192 sentences uttered by 24 speakers of the TIMIT (Garofolo et al., 1993) dataset. A validation set composed of 400 sentences uttered by 50 speakers is used for hyper-parameter tuning. |
| Hardware Specification | No | The paper mentions 'low computational power devices like smartphones' and 'CUDA kernel' in discussions, but does not specify the exact hardware (e.g., specific GPU or CPU models) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'pytorch-kaldi' and 'Kaldi' and 'Adam' optimizer but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Models are optimized with RMSPROP with vanilla hyper-parameters and an initial learning rate of 8 10 4. The learning rate is progressively annealed using a halving factor of 0.5 that is applied when no performance improvement on the validation set is observed. The models are trained during 25 epochs. A dropout rate of 0.2 is applied over all the hidden layers. |