reproducibilityindex.ai

Keep Skills in Mind: Understanding and Implementing Skills in Commonsense Question Answering

Authors: Meikai Bao, Qi Liu, Kai Zhang, Ye Liu, Linan Yue, Longfei Li, Jun Zhou

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on two publicly available CQA datasets show the effectiveness of our proposed model and the considerable impact of introducing skills.
Researcher Affiliation	Collaboration	Meikai Bao1,2 , Qi Liu1,2, , Kai Zhang1,2 , Ye Liu1,2 , Linan Yue1,2 , Longfei Li3 , Jun Zhou3 1Anhui Province Key Laboratory of Big Data Analysis and Application, University of Science and Technology of China 2State Key Laboratory of Cognitive Intelligence 3Ant Financial Services Group
Pseudocode	No	The paper describes the model's architecture and processes through textual descriptions and diagrams (Figure 2), but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code is at https://github.com/BAOOOOOM/DSCQA.
Open Datasets	Yes	We use two widely-used commonsense datasets, i.e., Commonsense QA (CSQA [Talmor et al., 2019]) and Commonsense QA 2.0 (CSQA2 [Talmor et al., 2021]), as benchmarks.
Dataset Splits	Yes	Table 2: Skills and their frequency in datasets (an example may involve more than one skill). Dataset CSQA2 CSQA skill train dev test train dev causality 5.71% 5.63% 6.47% 4.39% 3.85%...
Hardware Specification	No	The paper mentions using 'T5-large' as the backbone model but does not specify the hardware (e.g., specific GPU or CPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions using Adam W as the optimizer, fine-tuned Sentence-T5 as the context encoder, and Open Prompt as a framework, but does not provide specific version numbers for these software dependencies or underlying libraries like Python, PyTorch, or TensorFlow.
Experiment Setup	Yes	We use Adam W [Loshchilov and Hutter, 2019] as the optimizer and set the learning rate to 1e-5. We set the maximum length of the model input to 64. For general prefixes, the prefix length is set to 100, and its dropout rate is set to 0.5. The number of attention heads is set to 12 for question-skill attention and 8 for skill-question attention.