Time-aware Large Kernel Convolutions

Authors: Vasileios Lioutas, Yuhong Guo

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed method on large-scale standard machine translation, abstractive summarization and language modeling datasets and show that Ta LK Convolutions constitute an efficient improvement over other attention/convolution based approaches.
Researcher Affiliation Academia Vasileios Lioutas 1 Yuhong Guo 1 1School of Computer Science, Carleton University, Canada.
Pseudocode No No structured pseudocode or algorithm blocks are present in the paper.
Open Source Code Yes Our code and pre-trained models are available at github.com/lioutasb/Ta LKConvolutions.
Open Datasets Yes Machine Translation On the machine translation task, we report results on three mainstream benchmark datasets: WMT English to German (En-De), WMT English to French (En-Fr) and IWSLT German to English (De-En). ... Abstractive Summarization For the abstractive summa rization task, we decided to experiment with the CNNDaily Mail (Hermann et al., 2015; Nallapati et al., 2016) dataset. ... Language Modeling We experimented on the Wiki Text 103 (Merity et al., 2017) benchmark dataset.
Dataset Splits Yes For the WMT En-De we used the WMT 16 training data that consists of 4.5M sentence pairs. We validated on newstest2013 and tested on newstest2014. For the WMT En-Fr, we used 36M training sentence pairs from WMT 14. We validated on newstest2012+2013 and tested on newstest2014 evalua tion datasets.
Hardware Specification Yes We trained the WMT En-De, WMT En-Fr, CNN-Daily Mail and Wiki Text-103 models on 8 NVIDIA RTX 2080 Ti GPUs using mixed-precision train ing (Micikevicius et al., 2018) and the IWSLT De-En model using a single GPU.
Software Dependencies No The paper mentions 'CUDA implementation', 'PyTorch layer', and 'Fairseq toolkit' but does not provide specific version numbers for any of these software components.
Experiment Setup Yes For the machine translation models, we followed the same hyper-parameter setup as described in Wu et al. (2019). Specifically, we follow for WMT En-De and WMT En-Fr datasets the model hidden size d was set to 1024, the feed-forward hidden size dff was set to 4096 and the number of layers for the encoder and the decoder was set to 7 and 6 respectively. The number of heads was set to 16 and the lmax, rmax values to 3, 7, 15, 31 4 for each layer. For IWSLT De-En, the model hidden size d was set to 512, the feed-forward hidden size dff was set to 1024 and the number of layers for the encoder and the decoder was set to 7 and 6 respectively. The number of heads was set to 4 and the lmax, rmax values to 1, 3, 7, 15 4 for each layer.