reproducibilityindex.ai

Exploiting Background Knowledge in Compact Answer Generation for Why-Questions

Authors: Ryu Iida, Canasai Kruengkrai, Ryo Ishida, Kentaro Torisawa, Jong-Hoon Oh, Julien Kloetzer142-151

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results show that our method achieved significantly better ROUGE F-scores than existing encoder-decoder models and their variations that were augmented with query-attention and memory networks, which are used to exploit the background knowledge.
Researcher Affiliation	Collaboration	Ryu Iida, Canasai Kruengkrai, Ryo Ishida, Kentaro Torisawa, Jong-Hoon Oh, and Julien Kloetzer National Institute of Information and Communications Technology, Kyoto, 619-0289, Japan {ryu.iida, torisawa, rovellia, julien}@nict.go.jp, canasai@gmail.com, ishida.ryo@jp.panasonic.com
Pseudocode	No	The paper describes the method using text and a diagram, but does not include any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide a direct link or explicit statement about the availability of open-source code for the described methodology.
Open Datasets	Yes	Basically, we used the datasets used in Ishida et al. (2018), which were built in the following manner.
Dataset Splits	Yes	Each of our four datasets (training, validation, development, and test) consists of triples of a why-question, an answer passage and a compact answer. Table 2 provides the number of triples in each. Dataset #Triples #Questions Training set 15,130 2,060 Validation set 2,271 426 Development set 5,920 1,302 Test set 17,315 3,530
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory used for running the experiments.
Software Dependencies	No	For word segmentation, we used the morphological analyzer Me Cab (Kudo, Yamamoto, and Matsumoto 2004). We pretrained the word embedding vectors on Japanese Wikipedia articles using word2vec (Mikolov et al. 2013) and fixed their weight vectors during the training of each method.
Experiment Setup	Yes	For all the methods, we used the following settings determined through preliminary experiments using the development data. Both the sizes of the word embeddings and the RNN (i.e., GRU and LSTM) hidden states2 were set to 500. The source and target vocabulary sizes were both set to 50,000. We used Adam (Kingma and Ba 2015) with learning rate of 0.001 and mini-batches of 32 for optimization. If the validation error did not decrease after each epoch, the learning rate was divided by two. We used 1-layer RNNs as a decoder. We independently tried {1,2,3,4}-layer RNNs as encoders and chose the one that led to the optimal ROUGE1 F-score on the development data. We ran a maximum of 20 epochs and selected the best model of each method based on the perplexity of the validation data.