reproducibilityindex.ai

R-Drop: Regularized Dropout for Neural Networks

Authors: xiaobo liang, Lijun Wu, Juntao Li, Yue Wang, Qi Meng, Tao Qin, Wei Chen, Min Zhang, Tie-Yan Liu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on 5 widely used deep learning tasks (18 datasets in total), including neural machine translation, abstractive summarization, language understanding, language modeling, and image classiﬁcation, show that R-Drop is universally effective.
Researcher Affiliation	Collaboration	1Soochow University, 2Microsoft Research Asia
Pseudocode	Yes	Algorithm 1 R-Drop Training Algorithm
Open Source Code	Yes	Our code is available at Git Hub2. 2https://github.com/dropreg/R-Drop
Open Datasets	Yes	Datasets The datasets of low-resource scenario are from IWSLT competitions, which include IWSLT14 English German (En De), English Spanish (En Es), and IWSLT17 English French (En Fr), English Chinese (En Zh) translations. The rich-resource datasets come from the widely acknowledged WMT translation tasks, and we take the WMT14 English German and English French tasks. The GLUE [61] benchmark... CNN/Daily Mail dataset originally introduced by Hermann et al. [22]... Wikitext-103 dataset [41]... CIFAR-100 [31] and the ILSVRC-2012 Image Net dataset [8].
Dataset Splits	Yes	The IWSLT datasets contain about 170k training sentence pairs, 7k valid pairs, and 7k test pairs. The WMT data sizes are 4.5M, 36M for En De and En Fr respectively, valid and test data are from the corresponding newstest data. It contains 287,226 documents for training, 13,368 documents for validation and 11,490 documents for test. Same as [5], we report the perplexity on both valid and test sets. CIFAR-100 dataset consists of 60k images of 100 classes, and there are 600 images per class with 500 for training and 100 for testing.
Hardware Specification	No	The paper states, 'We provide the details in Appendix A.' (Question 3d in checklist) However, Appendix A is not provided in the given text.
Software Dependencies	No	The paper mentions using 'Fairseq [48]' but does not specify a version number for this or any other software dependency.
Experiment Setup	Yes	The weight α is set as 5 for all translation tasks. For each task, different random seeds and parameter settings are required, thus we dynamically adjust the coefﬁcient α among {0.1, 0.5, 1.0} for each setting. In this task, the coefﬁcient weight α is set as 0.7 to control the KL-divergence. We simply set the weight α to be 1.0 without tuning during training. During ﬁne-tuning, the weight α is set as 0.6 for both models. We vary k in {1, 2, 5, 10}. Here we vary the α in {1, 3, 5, 7, 10} and conduct experiments.