Joint RNN-Based Greedy Parsing and Word Composition

Authors: Joël Legrand and Ronan Collobert

ICLR 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental An empirical evaluation of our models as well as our compositional vectors is given in Section 4. 4 EXPERIMENTS Experiments were conducted using the standard English Penn Treebank data set (Marcus et al., 1993). We adopted the classical setup, with sections 02-21 for train, section 22 for validation, and section 23 for test. The validation corpus was used to select our hyper-parameters and best models. F1 performance scores are reported in Table 1.
Researcher Affiliation Collaboration Jo el Legrand Idiap Research Institute, Martigny, Switzerland Ecole Polytechnique F ed erale de Lausanne (EPFL), Lausanne Switzerland joel.legrand@idiap.ch Ronan Collobert Facebook AI Research, Menlo Park, CA, USA Idiap Research Institute, Martigny, Switzerland ronan@collobert.com
Pseudocode No The paper includes a description of the greedy parsing algorithm in text and Figure 1, but it is not formatted as pseudocode or a distinct algorithm block.
Open Source Code Yes We provide a fully functional implementation of the method described in this paper. 1The parser can be downloaded at joel-legrand.fr/parser.
Open Datasets Yes Experiments were conducted using the standard English Penn Treebank data set (Marcus et al., 1993).
Dataset Splits Yes We adopted the classical setup, with sections 02-21 for train, section 22 for validation, and section 23 for test.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It only generally states 'Our systems were trained'.
Software Dependencies No The part-of-speech tags were obtained using the freely available software SENNA4. http://ml.nec-labs.com/senna. Scores were obtained using the Evalb implementation5. http://nlp.cs.nyu.edu/evalb. The paper mentions software by name (SENNA, Evalb) but does not provide specific version numbers for them or any other software dependencies.
Experiment Setup Yes Lookup-table sizes for the words and tags (part-of-speech and parsing) are 200 and 20, respectively. The window size for the tagger is K = 7 (3 neighbours from each side). The size of the tagger s hidden layer is H = 500. We fixed the learning rate to λ = 0.15 during the stochastic gradient procedure. In our case, during the training phase, a dropout mask is applied to the output of the lookup-tables: each element of the output is set to 0 with a probability 0.25.