What Happens Next? Event Prediction Using a Compositional Neural Network Model

Authors: Mark Granroth-Wilding, Stephen Clark

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate a range of systems that induce vector-space representations of events and use them to make predictions, comparing the results to the positive pointwise mutual information (PPMI) measure of Chambers and Jurafsky (2008, henceforth C&J08). ... The test set prediction accuracy of each of the models is shown in table 1.
Researcher Affiliation Academia Mark Granroth-Wilding and Stephen Clark {mark.granroth-wilding, stephen.clark}@cl.cam.ac.uk Computer Laboratory, University of Cambridge, UK
Pseudocode No The paper describes its models and methods in text and diagrams (Figure 4) but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Implementations of all the models and the evaluation, as well as the evaluation dataset split, are available at http://mark.granroth-wilding.co.uk/\papers/what happens next/.
Open Datasets Yes Following Chambers and Jurafsky (2008; 2009), we extract events from the NYT portion of the Gigaword corpus (Graff et al. 2003). ... Graff, D.; Kong, J.; Chen, K.; and Maeda, K. 2003. English Gigaword, LDC2003T05. Linguistic Data Consortium, Philadelphia.
Dataset Splits Yes We randomly select 10% of the documents in the corpus to use as a test set and 10% to use as a development set, the latter being used to compare architectures and optimize hyperparameters prior to evaluation.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU, CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions software tools like 'C&C tools', 'Open NLP', 'word2vec', and 'Gensim implementation' but does not provide specific version numbers for any of them, which are required for reproducible software dependencies.
Experiment Setup Yes We train a skipgram model with hierarchical sampling, using a window size of 5 and vector size of 300. ... The input vector for each word is 300-dimensional. We use two hidden layers in the argument composition, with sizes 600 and 300, and two in the event composition, with sizes 400 and 200. Autoencoders were all trained with 30% dropout corruption for 2 iterations over the full training set, with a learning rate of 0.1 and λ = 0.001. Both subsequent training stages used a learning rate of 0.01 and λ = 0.018. The first (event composition only) was run for 3 iterations, the second (full network) for 8. All stages of training used SGD with 1,000-sized minibatches.