reproducibilityindex.ai

Probing Linguistic Information for Logical Inference in Pre-trained Language Models

Authors: Zeming Chen, Qiyue Gao10509-10517

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose a methodology for probing linguistic information for logical inference in pretrained language model representations. Our probing datasets cover a list of linguistic phenomena required by major symbolic inference systems. We ﬁnd that (i) pre-trained language models do encode several types of linguistic information for inference, but there are also some types of information that are weakly encoded, (ii) language models can effectively learn missing linguistic information through ﬁne-tuning. Overall, our ﬁndings provide insights into which aspects of linguistic information for logical inference do language models and their pre-training procedures capture. Moreover, we have demonstrated language models potential as semantic and background knowledge bases for supporting symbolic inference methods. ... For each task, we conducted probing experiments on multiple contextualized language models and compared results to several strong baselines. ... Experiment Setup To answer both questions 1 and 2 in the introduction, we experiment with ﬁve pre-trained language models.
Researcher Affiliation	Academia	Rose-Hulman Institute of Technology {chenz16, gaoq}@rose-hulman.edu
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing code or links to a code repository for the methodology described.
Open Datasets	Yes	We selected premises from the SNLI test set as our inputs and split them into training and testing sets. ... We annotated monotonicity information on all sentences in the MED dataset (Yanaka et al. 2019) as training examples using a monotonicity annotation tool called Udep2Mono (Chen and Gao 2021). ... We select a version ﬁne-tuned on the Multi NLI (Williams, Nangia, and Bowman 2018) dataset for each language model.
Dataset Splits	No	The paper specifies training and testing set sizes for each task in Figure 2 (e.g., 'train 10000', 'test 5000' for Semantic Graph), but it does not explicitly mention or provide details for a separate validation split.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models, or cloud computing instance types.
Software Dependencies	No	The paper mentions using pre-trained language models (BERT, RoBERTa, DeBERTa) and references the JIANT toolkit in the acknowledgments ('Special thanks to the Machine Learning for Language Group at NYU for their wonderful NLP toolkit, JIANT (Phang et al. 2020)'). However, it does not specify version numbers for general software dependencies or libraries.
Experiment Setup	Yes	Experiment Setup To answer both questions 1 and 2 in the introduction, we experiment with ﬁve pre-trained language models. We selected BERT-base and BERT-large (Devlin et al. 2019), Ro BERTabase and Ro BERTa-large (Liu et al. 2019), and De BERTa (He et al. 2021). ... To ensure we only probe a pre-trained language model without modifying its parameters, we freeze its parameters to not allow for gradient updates. ... We ﬁrst choose the linear classiﬁer. ... we conducted probing using a Multi-layer Perceptron (MLP) classiﬁer with one hidden layer. ... We select four uncontextualized word embeddings as our baselines, including random embedding, Fast Text (Joulin et al. 2017), Glove (Pennington, Socher, and Manning 2014), and Word2Vec (Mikolov et al. 2013). ... we evaluate NLI models ﬁne-tuned on Multi NLI (Williams, Nangia, and Bowman 2018), using probing tasks that do not beneﬁt from the pre-trained models.