Time-aware Large Kernel Convolutions
Authors: Vasileios Lioutas, Yuhong Guo
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed method on large-scale standard machine translation, abstractive summarization and language modeling datasets and show that Ta LK Convolutions constitute an efficient improvement over other attention/convolution based approaches. |
| Researcher Affiliation | Academia | Vasileios Lioutas 1 Yuhong Guo 1 1School of Computer Science, Carleton University, Canada. |
| Pseudocode | No | No structured pseudocode or algorithm blocks are present in the paper. |
| Open Source Code | Yes | Our code and pre-trained models are available at github.com/lioutasb/Ta LKConvolutions. |
| Open Datasets | Yes | Machine Translation On the machine translation task, we report results on three mainstream benchmark datasets: WMT English to German (En-De), WMT English to French (En-Fr) and IWSLT German to English (De-En). ... Abstractive Summarization For the abstractive summa rization task, we decided to experiment with the CNNDaily Mail (Hermann et al., 2015; Nallapati et al., 2016) dataset. ... Language Modeling We experimented on the Wiki Text 103 (Merity et al., 2017) benchmark dataset. |
| Dataset Splits | Yes | For the WMT En-De we used the WMT 16 training data that consists of 4.5M sentence pairs. We validated on newstest2013 and tested on newstest2014. For the WMT En-Fr, we used 36M training sentence pairs from WMT 14. We validated on newstest2012+2013 and tested on newstest2014 evalua tion datasets. |
| Hardware Specification | Yes | We trained the WMT En-De, WMT En-Fr, CNN-Daily Mail and Wiki Text-103 models on 8 NVIDIA RTX 2080 Ti GPUs using mixed-precision train ing (Micikevicius et al., 2018) and the IWSLT De-En model using a single GPU. |
| Software Dependencies | No | The paper mentions 'CUDA implementation', 'PyTorch layer', and 'Fairseq toolkit' but does not provide specific version numbers for any of these software components. |
| Experiment Setup | Yes | For the machine translation models, we followed the same hyper-parameter setup as described in Wu et al. (2019). Specifically, we follow for WMT En-De and WMT En-Fr datasets the model hidden size d was set to 1024, the feed-forward hidden size dff was set to 4096 and the number of layers for the encoder and the decoder was set to 7 and 6 respectively. The number of heads was set to 16 and the lmax, rmax values to 3, 7, 15, 31 4 for each layer. For IWSLT De-En, the model hidden size d was set to 512, the feed-forward hidden size dff was set to 1024 and the number of layers for the encoder and the decoder was set to 7 and 6 respectively. The number of heads was set to 4 and the lmax, rmax values to 1, 3, 7, 15 4 for each layer. |