reproducibilityindex.ai

Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model

Authors: Wenhan Xiong, Jingfei Du, William Yang Wang, Veselin Stoyanov

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on 10 common Wikidata (Vrandeˇci c & Kr otzsch, 2014) relations reveal that existing pretrained models encode entity-level knowledge only to a limited degree. Thus, we propose a new weakly supervised knowledge learning objective that requires the Published as a conference paper at ICLR 2020 model to distinguish between true and false knowledge expressed in natural language. Speciﬁcally, we replace entity mentions in the original documents with names of other entities of the same type and train the models to distinguish the correct entity mention from randomly chosen ones. Models trained with this objective demonstrates much stronger fact completion performance for most relations we test on.
Researcher Affiliation	Collaboration	Wenhan Xiong , Jingfei Du , William Yang Wang , Veselin Stoyanov , University of California, Santa Barbara Facebook AI {xwhan, william}@cs.ucsb.edu, {jingfeidu, ves}@fb.com
Pseudocode	No	The paper describes its methods in prose, such as in the 'ENTITY REPLACEMENT TRAINING' section, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps formatted like code.
Open Source Code	No	The paper states, 'the fact completion baselines are implemented with Huggingface s Pytorch Transformers9' and provides the URL 'https://huggingface.co/pytorch-transformers'. This refers to a third-party library used, not the open-sourcing of the authors' own WKLM methodology code.
Open Datasets	Yes	We use the whole English Wikipedia dump as training data and rely on all Wikipedia entities. ... We consider four question answering datasets: Web Questions (Berant et al., 2013), Trivia QA7 (Joshi et al., 2017), Quasar-T (Dhingra et al., 2017) ... Search QA (Dunn et al., 2017)... We ﬁrst use the standard SQu AD (Rajpurkar et al., 2016) benchmark to validate our model s answer extraction performance.
Dataset Splits	Yes	We split the training data (created by distant supervision) of Web Questions with a ratio (9:1) for training and development. ... Table 2: Properties of the QA Datasets. Dataset Train Valid Test
Hardware Specification	Yes	We pretrain the models with 32 V100 GPUs for 3 days. We use at most 2 GPUs for ﬁne-tuning the paragraph reader, use 8 GPUs for ﬁne-tuning the paragraph ranker. The entity-typing experiments require larger batch sizes and take 8 GPUs for training.
Software Dependencies	No	The paper mentions implementing their method using 'Fairseq Ott et al. (2019)' and baselines with 'Huggingface s Pytorch Transformers', but it does not specify version numbers for these software components or any other key dependencies.
Experiment Setup	Yes	For the knowledge learning pretraining phase, we use the Adam optimizer (Kingma & Ba, 2014) with learning rate 1e-5, batch size 128 and weight decay 0.01. The model is pretrained on 32 V100 GPUs for 3 days. To train the paragraph reader for open-domain QA, we select the best learning rate from {1e-6, 5e-6, 1e-5, 2e-5} and last layer dropout ratio from {0.1, 0.2}. We set the maximum training epoch to be 10 and batch size to be 32. The maximal input sequence length is 512 for Web Questions and 128 for the other three datasets that use sentence-level paragraphs.