Learning Language Representations with Logical Inductive Bias
Authors: Jianshu Chen
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on several language understanding tasks show that our pretrained FOLNet model outperforms the existing strong transformer-based approaches. |
| Researcher Affiliation | Industry | Jianshu Chen Tencent AI Lab, Bellevue, WA 98004, USA jianshuchen@global.tencent.com |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The code along with the pretrained model checkpoints will be released publicly. |
| Open Datasets | Yes | Wikipedia + Book Corpus (Zhu et al., 2015) (16GB in texts) and (ii) a larger set of 160GB texts consisting of Wikipedia, Book Corpus2, Open Web Text2, and Pile-CC (extracted from the Pile dataset (Gao et al., 2020)). We consider three benchmarks: GLUE (Wang et al., 2019), SQu AD 2.0 (Rajpurkar et al., 2016b), and FOLIO (Han et al., 2022). |
| Dataset Splits | Yes | The dataset has an official train/validation/test split with 1,004/204/227 examples, respectively. |
| Hardware Specification | Yes | The pretraining of FOLNet Base on Wikipedia + Book Corpus (16GB) for 8K steps takes about 12 hours using 512 V100 GPUs. |
| Software Dependencies | No | We implement both our pretraining and finetuning pipelines using Py Torch (Paszke et al., 2019) and automatic mixed precision (AMP) learning (Micikevicius et al., 2018) based on the Apex library (Nvidia, 2019). |
| Experiment Setup | Yes | We report the hyper-parameters of pretraining FOLNet in Table 8. The hypper-parameters for finetuning different downstream tasks are included in Table 9. |