reproducibilityindex.ai

Grammar as a Foreign Language

Authors: Oriol Vinyals, Łukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, Geoffrey Hinton

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper we show that the domain agnostic attention-enhanced sequence-to-sequence model achieves state-of-the-art results on the most widely used syntactic constituency parsing dataset, when trained on a large synthetic corpus that was annotated using existing parsers. It also matches the performance of standard parsers when trained only on a small human-annotated dataset...
Researcher Affiliation	Industry	Oriol Vinyals Google vinyals@google.com Lukasz Kaiser Google lukaszkaiser@google.com Terry Koo Google terrykoo@google.com Slav Petrov Google slav@google.com Ilya Sutskever Google ilyasu@google.com Geoffrey Hinton Google geoffhinton@google.com
Pseudocode	No	The paper describes the model mathematically but does not include explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not state that source code for the described methodology is publicly available.
Open Datasets	Yes	For one, we trained on the standard WSJ training dataset... We used the Onto Notes corpus version 5 [7], the English Web Treebank [8] and the updated and corrected Question Treebank [9].1 ...1All treebanks are available through the Linguistic Data Consortium (LDC): Onto Notes (LDC2013T19), English Web Treebank (LDC2012T13) and Question Treebank (LDC2012R121).
Dataset Splits	Yes	We use the standard EVALB tool2 for evaluation and report F1 scores on our developments set (section 22 of the Penn Treebank) and the ﬁnal test set (section 23) in Table 1.
Hardware Specification	No	Our LSTM+A model, running on a multi-core CPU using batches of 128 sentences on a generic unoptimized decoder, can parse over 120 sentences from WSJ per second for sentences of all lengths (using beam-size 1). This mentions a 'multi-core CPU' but lacks specific model numbers or detailed specifications.
Software Dependencies	No	The paper mentions using word2vec but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	In our experiments we used a model with 3 LSTM layers and 256 units in each layer, which we call LSTM+A. Our input vocabulary size was 90K and we output 128 symbols...We only used dropout when training on the small WSJ dataset and its inﬂuence was signiﬁcant...Our decoder uses a beam of a ﬁxed size to calculate the output sequence of labels. We experimented with different settings for the beam size. It turns out that it is almost irrelevant. We report report results that use beam size 10...