Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model
Authors: Wenhan Xiong, Jingfei Du, William Yang Wang, Veselin Stoyanov
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on 10 common Wikidata (Vrandeˇci c & Kr otzsch, 2014) relations reveal that existing pretrained models encode entity-level knowledge only to a limited degree. Thus, we propose a new weakly supervised knowledge learning objective that requires the Published as a conference paper at ICLR 2020 model to distinguish between true and false knowledge expressed in natural language. Specifically, we replace entity mentions in the original documents with names of other entities of the same type and train the models to distinguish the correct entity mention from randomly chosen ones. Models trained with this objective demonstrates much stronger fact completion performance for most relations we test on. |
| Researcher Affiliation | Collaboration | Wenhan Xiong , Jingfei Du , William Yang Wang , Veselin Stoyanov , University of California, Santa Barbara Facebook AI {xwhan, william}@cs.ucsb.edu, {jingfeidu, ves}@fb.com |
| Pseudocode | No | The paper describes its methods in prose, such as in the 'ENTITY REPLACEMENT TRAINING' section, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps formatted like code. |
| Open Source Code | No | The paper states, 'the fact completion baselines are implemented with Huggingface s Pytorch Transformers9' and provides the URL 'https://huggingface.co/pytorch-transformers'. This refers to a third-party library used, not the open-sourcing of the authors' own WKLM methodology code. |
| Open Datasets | Yes | We use the whole English Wikipedia dump as training data and rely on all Wikipedia entities. ... We consider four question answering datasets: Web Questions (Berant et al., 2013), Trivia QA7 (Joshi et al., 2017), Quasar-T (Dhingra et al., 2017) ... Search QA (Dunn et al., 2017)... We first use the standard SQu AD (Rajpurkar et al., 2016) benchmark to validate our model s answer extraction performance. |
| Dataset Splits | Yes | We split the training data (created by distant supervision) of Web Questions with a ratio (9:1) for training and development. ... Table 2: Properties of the QA Datasets. Dataset Train Valid Test |
| Hardware Specification | Yes | We pretrain the models with 32 V100 GPUs for 3 days. We use at most 2 GPUs for fine-tuning the paragraph reader, use 8 GPUs for fine-tuning the paragraph ranker. The entity-typing experiments require larger batch sizes and take 8 GPUs for training. |
| Software Dependencies | No | The paper mentions implementing their method using 'Fairseq Ott et al. (2019)' and baselines with 'Huggingface s Pytorch Transformers', but it does not specify version numbers for these software components or any other key dependencies. |
| Experiment Setup | Yes | For the knowledge learning pretraining phase, we use the Adam optimizer (Kingma & Ba, 2014) with learning rate 1e-5, batch size 128 and weight decay 0.01. The model is pretrained on 32 V100 GPUs for 3 days. To train the paragraph reader for open-domain QA, we select the best learning rate from {1e-6, 5e-6, 1e-5, 2e-5} and last layer dropout ratio from {0.1, 0.2}. We set the maximum training epoch to be 10 and batch size to be 32. The maximal input sequence length is 512 for Web Questions and 128 for the other three datasets that use sentence-level paragraphs. |