reproducibilityindex.ai

Weakly-Supervised Grammar-Informed Bayesian CCG Parser Learning

Authors: Dan Garrette, Chris Dyer, Jason Baldridge, Noah Smith

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated our approach on the three available CCG corpora: English CCGBank (Hockenmaier and Steedman 2007), Chinese Treebank CCG (Tse and Curran 2010), and the Italian CCG-TUT corpus (Bos, Bosco, and Mazzei 2009). Each corpus was split into four non-overlapping datasets: a portion for constructing the tag dictionary, sentences for the unlabeled training data, development trees (used for tuning α, pterm, pmod, and pfwd hyperparameters), and test trees. We used the same splits as Garrette et al. (2014).
Researcher Affiliation	Academia	Department of Computer Science, University of Texas at Austin, dhg@cs.utexas.edu School of Computer Science, Carnegie Mellon University, {cdyer,nasmith}@cs.cmu.edu Department of Linguistics, University of Texas at Austin, jbaldrid@utexas.edu
Pseudocode	Yes	Borrowing from the recursive generative function notation of Johnson, Grifﬁths, and Goldwater (2007), our process can be summarized as: Parameters: σ Dirichlet(ασ, σ0) root categories θt Dirichlet(αθ, θ0) t T binary productions πt Dirichlet(απ, π0) t T unary productions µt Dirichlet(αµ, µ0 t) t T terminal productions λt Dir( 1, 1, 1 ) t T production mixture Sentence: s Categorical(σ) generate(s) where function generate(t) : z Categorical(λt) if z = 1 : u, v \| t Categorical(θt) Tree(t, generate(u), generate(v)) if z = 2 : u \| t Categorical(πt) Tree(t, generate(u))) if z = 3 : w \| t Categorical(µt) Leaf(t, w)
Open Source Code	No	The paper does not provide any explicit statement or link for open-source code for the methodology described.
Open Datasets	Yes	We evaluated our approach on the three available CCG corpora: English CCGBank (Hockenmaier and Steedman 2007), Chinese Treebank CCG (Tse and Curran 2010), and the Italian CCG-TUT corpus (Bos, Bosco, and Mazzei 2009).
Dataset Splits	Yes	Each corpus was split into four non-overlapping datasets: a portion for constructing the tag dictionary, sentences for the unlabeled training data, development trees (used for tuning α, pterm, pmod, and pfwd hyperparameters), and test trees. We used the same splits as Garrette et al. (2014).
Hardware Specification	No	The acknowledgments state: "Experiments were run on the UTCS Mastodon Cluster, provided by NSF grant EIA-0303609." While a specific cluster is named, no details about the CPU, GPU, memory, or other specific hardware components of this cluster are provided.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies used in the experiments.
Experiment Setup	Yes	For the category grammar, we used pterm=0.7, pmod=0.1, pfwd=0.5. For the priors, we use ασ=1, αθ=100, απ=10,000, αµ=10,000.