Generalised Brown Clustering and Roll-Up Feature Generation
Authors: Leon Derczynski, Sean Chester
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Table 2 presents extrinsic results for decoupling a and |C|. We measure F1 at the CoNLL 03 task’s test-B set, using a linear-chain CRF and shearing at depths 4, 6, 10 and 20 as the only features, evaluating with CRFsuite at token level. |
| Researcher Affiliation | Academia | Leon Derczynski University of Sheffield 211 Portobello, Sheffield S1 4DP United Kingdom leon@dcs.shef.ac.uk Sean Chester NTNU Aarhus Universitet Sem Saelandsvei 9 Abogade 34 7491 Trondheim, Norway 8200 Aarhus N, Denmark sean.chester@idi.ntnu.no |
| Pseudocode | Yes | Algorithm 1 Brown clustering as proposed by Brown et al. Algorithm 2 Generalised Brown clustering |
| Open Source Code | Yes | Software for Generalised Brown clustering and roll-up feature generation is available freely at http://dx.doi.org/10.5281/zenodo.33758 (Chester and Derczynski 2015). |
| Open Datasets | Yes | We measure F1 at the CoNLL 03 task’s test-B set, using a linear-chain CRF... using a computationally feasible subset of the Brown corpus (Francis and Kucera 1979) with 12k tokens and 3.7k word types... Using the RCV1 corpus cleaned as per Liang (2005) |
| Dataset Splits | No | The paper mentions test data but does not explicitly provide details about training/validation splits (percentages, sample counts, or cross-validation setup). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like "CRFsuite" and "conlleval.pl" but does not specify version numbers for these dependencies. |
| Experiment Setup | Yes | Table 2 presents extrinsic results for decoupling a and |C|. We measure F1 at the CoNLL 03 task’s test-B set, using a linear-chain CRF and shearing at depths 4, 6, 10 and 20 as the only features, evaluating with CRFsuite at token level... using CRFsuite with stochastic gradient descent, and evaluating with conlleval.pl at chunk level... with a = 2560 as per Derczynski, Chester, and Bøgh (2015), we shear the tree for each bitdepth in l = {4, 6, 10, 20} as per Ratinov and Roth (2009) and others in later literature. |