JEC-QA: A Legal-Domain Question Answering Dataset
Authors: Haoxi Zhong, Chaojun Xiao, Cunchao Tu, Tianyang Zhang, Zhiyuan Liu, Maosong Sun9701-9708
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct detailed experiments and analysis to investigate the performance of existing question answering models on JEC-QA. By evaluating the performance of these methods on JEC-QA, we show that even the best method can only achieve about 25% and 29% on KD-questions and CA-questions respectively, while skilled humans and unskilled humans can reach 81% and 64% accuracies on JEC-QA. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science and Technology Institute for Artificial Intelligence, Tsinghua University, Beijing, China Beijing National Research Center for Information Science and Technology, China 2Beijing Powerlaw Intelligent Technology Co., Ltd., China |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper states 'We will release JEC-QA and our baselines' and provides a link 'You can access the dataset from http://jecqa.thunlp.org/.' but the link is explicitly for the dataset, not the source code for their methodology. The phrase 'will release' also implies future availability, not current. |
| Open Datasets | Yes | We will release JEC-QA and our baselines to help improve the reasoning ability of machine comprehension models. You can access the dataset from http://jecqa.thunlp.org/. |
| Dataset Splits | No | The paper states 'For all experiments, we randomly select 20% of the data as the test dataset.' but does not explicitly provide information on validation dataset splits or a three-way train/validation/test split. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU models, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions software like fastText, Bert Adam, Adam, and Elastic Search, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | For all models, the dimension of word embeddings is w = 200 and the hidden size of model layers is d = 256. We choose K = 6 for experiments and we will discuss the reason in Comparative Analysis. We employ BERT as our topic classifier to select the top-2 relevant topics and retrieve K most relevant reading paragraphs for each topic. Besides, we also retrieve K extra reading paragraphs from Chinese legal provisions. In total, we retrieve 3K paragraphs for each option. |