reproducibilityindex.ai

Answer Generation through Unified Memories over Multiple Passages

Authors: Makoto Nakatsuji, Sohei Okui

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluations indicate that GUM-MP generates much more accurate results than the current models do. 5 Evaluation This section evaluates GUM-MP in detail. 5.4 Results Table 2 and Table 4 summarize the ablation study for the MSMARCO dataset and for Oshiete-goo dataset.
Researcher Affiliation	Industry	Makoto Nakatsuji, Sohei Okui NTT Resonant Inc. Granparktower, 3-4-1 Shibaura, Minato-ku, Tokyo 108-0023, Japan nakatsuji.makoto@gmail.com, okui@nttr.co.jp
Pseudocode	No	No pseudocode or algorithm blocks are explicitly presented or labeled in the paper.
Open Source Code	No	The paper does not provide an explicit statement about the release of source code for the described methodology, nor does it include a link to a code repository.
Open Datasets	Yes	We used the MS-MARCO dataset and a community-QA dataset of a Japanese QA service, Oshiete goo, in our evaluations since they provide answers with multiple passages assigned to questions. MS-MARCO [Nguyen et al., 2016]... Oshiete-goo This dataset focused on the love advice category of the Japanese QA community, Oshiete-goo [Nakatsuji, 2019; Nakatsuji and Okui, 2020].
Dataset Splits	No	The paper mentions training and test sets (e.g., 'The training set contained 16,500 questions and the test set contained 2,500 questions.' for MS-MARCO and 'Then, we randomly chose onetenth of the questions as the test dataset. The rest was used as the training dataset.' for Oshiete-goo), but does not explicitly state details for a validation split.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as CPU, GPU, or memory specifications.
Software Dependencies	No	The paper mentions using 'Glove model' and 'Bert-based model' but does not provide specific version numbers for software libraries or dependencies (e.g., PyTorch version, TensorFlow version, or specific library versions).
Experiment Setup	Yes	We set the word embedding size to 300 and the batch size to 32. The decoder vocabulary was restricted to 5,000 according to the frequency for the MS-MARCO dataset. The decoder vocabulary was not restricted for the Oshiete-goo dataset. Each question, passage, and answer were truncated to 50, 130, and 50 words for the MS-MARCO dataset (300, 300, and 50 words for the Oshiete-goo one). The epoch count was 30, the learning rate was 0.0005, Z in MPM was 5, and the beam size was 20.