Probing Linguistic Information for Logical Inference in Pre-trained Language Models

Authors: Zeming Chen, Qiyue Gao10509-10517

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose a methodology for probing linguistic information for logical inference in pretrained language model representations. Our probing datasets cover a list of linguistic phenomena required by major symbolic inference systems. We find that (i) pre-trained language models do encode several types of linguistic information for inference, but there are also some types of information that are weakly encoded, (ii) language models can effectively learn missing linguistic information through fine-tuning. Overall, our findings provide insights into which aspects of linguistic information for logical inference do language models and their pre-training procedures capture. Moreover, we have demonstrated language models potential as semantic and background knowledge bases for supporting symbolic inference methods. ... For each task, we conducted probing experiments on multiple contextualized language models and compared results to several strong baselines. ... Experiment Setup To answer both questions 1 and 2 in the introduction, we experiment with five pre-trained language models.
Researcher Affiliation Academia Rose-Hulman Institute of Technology {chenz16, gaoq}@rose-hulman.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements about releasing code or links to a code repository for the methodology described.
Open Datasets Yes We selected premises from the SNLI test set as our inputs and split them into training and testing sets. ... We annotated monotonicity information on all sentences in the MED dataset (Yanaka et al. 2019) as training examples using a monotonicity annotation tool called Udep2Mono (Chen and Gao 2021). ... We select a version fine-tuned on the Multi NLI (Williams, Nangia, and Bowman 2018) dataset for each language model.
Dataset Splits No The paper specifies training and testing set sizes for each task in Figure 2 (e.g., 'train 10000', 'test 5000' for Semantic Graph), but it does not explicitly mention or provide details for a separate validation split.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models, or cloud computing instance types.
Software Dependencies No The paper mentions using pre-trained language models (BERT, RoBERTa, DeBERTa) and references the JIANT toolkit in the acknowledgments ('Special thanks to the Machine Learning for Language Group at NYU for their wonderful NLP toolkit, JIANT (Phang et al. 2020)'). However, it does not specify version numbers for general software dependencies or libraries.
Experiment Setup Yes Experiment Setup To answer both questions 1 and 2 in the introduction, we experiment with five pre-trained language models. We selected BERT-base and BERT-large (Devlin et al. 2019), Ro BERTabase and Ro BERTa-large (Liu et al. 2019), and De BERTa (He et al. 2021). ... To ensure we only probe a pre-trained language model without modifying its parameters, we freeze its parameters to not allow for gradient updates. ... We first choose the linear classifier. ... we conducted probing using a Multi-layer Perceptron (MLP) classifier with one hidden layer. ... We select four uncontextualized word embeddings as our baselines, including random embedding, Fast Text (Joulin et al. 2017), Glove (Pennington, Socher, and Manning 2014), and Word2Vec (Mikolov et al. 2013). ... we evaluate NLI models fine-tuned on Multi NLI (Williams, Nangia, and Bowman 2018), using probing tasks that do not benefit from the pre-trained models.