Learning Language Representations with Logical Inductive Bias

Authors: Jianshu Chen

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on several language understanding tasks show that our pretrained FOLNet model outperforms the existing strong transformer-based approaches.
Researcher Affiliation Industry Jianshu Chen Tencent AI Lab, Bellevue, WA 98004, USA jianshuchen@global.tencent.com
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The code along with the pretrained model checkpoints will be released publicly.
Open Datasets Yes Wikipedia + Book Corpus (Zhu et al., 2015) (16GB in texts) and (ii) a larger set of 160GB texts consisting of Wikipedia, Book Corpus2, Open Web Text2, and Pile-CC (extracted from the Pile dataset (Gao et al., 2020)). We consider three benchmarks: GLUE (Wang et al., 2019), SQu AD 2.0 (Rajpurkar et al., 2016b), and FOLIO (Han et al., 2022).
Dataset Splits Yes The dataset has an official train/validation/test split with 1,004/204/227 examples, respectively.
Hardware Specification Yes The pretraining of FOLNet Base on Wikipedia + Book Corpus (16GB) for 8K steps takes about 12 hours using 512 V100 GPUs.
Software Dependencies No We implement both our pretraining and finetuning pipelines using Py Torch (Paszke et al., 2019) and automatic mixed precision (AMP) learning (Micikevicius et al., 2018) based on the Apex library (Nvidia, 2019).
Experiment Setup Yes We report the hyper-parameters of pretraining FOLNet in Table 8. The hypper-parameters for finetuning different downstream tasks are included in Table 9.