reproducibilityindex.ai

Time-aware Large Kernel Convolutions

Authors: Vasileios Lioutas, Yuhong Guo

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed method on large-scale standard machine translation, abstractive summarization and language modeling datasets and show that Ta LK Convolutions constitute an efﬁcient improvement over other attention/convolution based approaches.
Researcher Affiliation	Academia	Vasileios Lioutas 1 Yuhong Guo 1 1School of Computer Science, Carleton University, Canada.
Pseudocode	No	No structured pseudocode or algorithm blocks are present in the paper.
Open Source Code	Yes	Our code and pre-trained models are available at github.com/lioutasb/Ta LKConvolutions.
Open Datasets	Yes	Machine Translation On the machine translation task, we report results on three mainstream benchmark datasets: WMT English to German (En-De), WMT English to French (En-Fr) and IWSLT German to English (De-En). ... Abstractive Summarization For the abstractive summa rization task, we decided to experiment with the CNNDaily Mail (Hermann et al., 2015; Nallapati et al., 2016) dataset. ... Language Modeling We experimented on the Wiki Text 103 (Merity et al., 2017) benchmark dataset.
Dataset Splits	Yes	For the WMT En-De we used the WMT 16 training data that consists of 4.5M sentence pairs. We validated on newstest2013 and tested on newstest2014. For the WMT En-Fr, we used 36M training sentence pairs from WMT 14. We validated on newstest2012+2013 and tested on newstest2014 evalua tion datasets.
Hardware Specification	Yes	We trained the WMT En-De, WMT En-Fr, CNN-Daily Mail and Wiki Text-103 models on 8 NVIDIA RTX 2080 Ti GPUs using mixed-precision train ing (Micikevicius et al., 2018) and the IWSLT De-En model using a single GPU.
Software Dependencies	No	The paper mentions 'CUDA implementation', 'PyTorch layer', and 'Fairseq toolkit' but does not provide specific version numbers for any of these software components.
Experiment Setup	Yes	For the machine translation models, we followed the same hyper-parameter setup as described in Wu et al. (2019). Speciﬁcally, we follow for WMT En-De and WMT En-Fr datasets the model hidden size d was set to 1024, the feed-forward hidden size dff was set to 4096 and the number of layers for the encoder and the decoder was set to 7 and 6 respectively. The number of heads was set to 16 and the lmax, rmax values to 3, 7, 15, 31 4 for each layer. For IWSLT De-En, the model hidden size d was set to 512, the feed-forward hidden size dff was set to 1024 and the number of layers for the encoder and the decoder was set to 7 and 6 respectively. The number of heads was set to 4 and the lmax, rmax values to 1, 3, 7, 15 4 for each layer.