Learning Statistical Scripts with LSTM Recurrent Neural Networks

Authors: Karl Pichotta, Raymond Mooney

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our system on two tasks, inferring held-out events from text and inferring novel events from text, substantially outperforming prior approaches on both tasks.
Researcher Affiliation Academia Karl Pichotta and Raymond J. Mooney {pichotta,mooney}@cs.utexas.edu Department of Computer Science The University of Texas at Austin Austin, TX 78712, USA
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code for the methodology described, nor does it include a link to a code repository.
Open Datasets Yes For our corpus, we use English Language Wikipedia,4 breaking articles into paragraphs. Our training set was approximately 8.9 million event sequences, our validation set was approximately 89,000 event sequences, and our test set was 2,000 events from 411 sequences, such that no test-set article is in the training or validation set. ... 4http://en.wikipedia.org/, dump from Jan 2, 2014.
Dataset Splits Yes Our training set was approximately 8.9 million event sequences, our validation set was approximately 89,000 event sequences, and our test set was 2,000 events from 411 sequences, such that no test-set article is in the training or validation set.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments. It only mentions training duration.
Software Dependencies Yes We use version 3.3.1 of the Stanford Core NLP system. We use the implementation of LSTM provided by the Caffe library (Jia et al. 2014).
Experiment Setup Yes Since RNNs are quite sensitive to hyperparameter values (Sutskever et al. 2013), we measured validation set performance in different regions of hyperparameter space, ultimately selecting learning rate η = 0.1, momentum parameter μ = 0.98, LSTM vector length of 1,000, and a Normal N(0, 0.1) distribution for random initialization (biases are initialized to 0). Event component embeddings have dimension 300. We use ℓ2 regularization and Dropout (Hinton et al. 2012) with dropout probability 0.5. We clip gradient updates at 10 to prevent exploding gradients (Pascanu, Mikolov, and Bengio 2013) We damp η by 0.9 every 100,000 iterations. We train for 750,000 batch updates, which took between 50 and 60 hours. We use a beam width of 50 in all beam searches.