Relational Gating for ''What If'' Reasoning

Authors: Chen Zheng, Parisa Kordjamshidi

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that modeling pairwise relationships helps to capture higher-order relations and find the line of reasoning for causes and effects in the procedural descriptions. Our proposed approach achieves the state-of-the-art results on the WIQA dataset.
Researcher Affiliation Academia Chen Zheng and Parisa Kordjamshidi Michigan State University {zhengc12, kordjams}@msu.edu
Pseudocode No The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Our code is available at https://github.com/HLR/RGN.
Open Datasets Yes WIQA dataset is available at http://data.allenai.org/wiqa/.
Dataset Splits Yes Data Train Dev Test V1 Test V2 Total Questions 29808 6894 3993 3003 43698
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No We implemented RGN using Py Torch. We used Ro BERTa Base in our model. The paper mentions PyTorch and RoBERTa, but does not provide specific version numbers for these software components.
Experiment Setup Yes For each data sample, we keep 128 tokens as the max length for the question, and 256 tokens as the max length for paragraph contents. Notice that both gated entity representations for question and paragraph use k = 10 for selecting top-k entities in our experiments. The value of this hyper-parameter was selected after experimenting with various values in {3, 5, 7, 10, 15, 20} using the development dataset. For the Gated relation representations, top-10 ranked pairs are used to reduce the computational cost and reduce the unnecessary relations. In the relation gating process, we use two hidden layers for multi-layer perceptrons. The task-specific output classifier contains two MLP layers. The model is optimized using the Adam optimizer. The training batch size is 4. During training, we freeze the parameters of Ro BERTa in the first two epochs, and we stop the training after no performance improvements observed on the development dataset which happens after 8 epochs.