Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Joint RNN-Based Greedy Parsing and Word Composition

Authors: Joël Legrand and Ronan Collobert

ICLR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental An empirical evaluation of our models as well as our compositional vectors is given in Section 4. 4 EXPERIMENTS Experiments were conducted using the standard English Penn Treebank data set (Marcus et al., 1993). We adopted the classical setup, with sections 02-21 for train, section 22 for validation, and section 23 for test. The validation corpus was used to select our hyper-parameters and best models. F1 performance scores are reported in Table 1.
Researcher Affiliation Collaboration Jo el Legrand Idiap Research Institute, Martigny, Switzerland Ecole Polytechnique F ed erale de Lausanne (EPFL), Lausanne Switzerland EMAIL Ronan Collobert Facebook AI Research, Menlo Park, CA, USA Idiap Research Institute, Martigny, Switzerland EMAIL
Pseudocode No The paper includes a description of the greedy parsing algorithm in text and Figure 1, but it is not formatted as pseudocode or a distinct algorithm block.
Open Source Code Yes We provide a fully functional implementation of the method described in this paper. 1The parser can be downloaded at joel-legrand.fr/parser.
Open Datasets Yes Experiments were conducted using the standard English Penn Treebank data set (Marcus et al., 1993).
Dataset Splits Yes We adopted the classical setup, with sections 02-21 for train, section 22 for validation, and section 23 for test.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It only generally states 'Our systems were trained'.
Software Dependencies No The part-of-speech tags were obtained using the freely available software SENNA4. http://ml.nec-labs.com/senna. Scores were obtained using the Evalb implementation5. http://nlp.cs.nyu.edu/evalb. The paper mentions software by name (SENNA, Evalb) but does not provide specific version numbers for them or any other software dependencies.
Experiment Setup Yes Lookup-table sizes for the words and tags (part-of-speech and parsing) are 200 and 20, respectively. The window size for the tagger is K = 7 (3 neighbours from each side). The size of the tagger s hidden layer is H = 500. We fixed the learning rate to λ = 0.15 during the stochastic gradient procedure. In our case, during the training phase, a dropout mask is applied to the output of the lookup-tables: each element of the output is set to 0 with a probability 0.25.