Question DifÞculty Prediction for READING Problems in Standard Tests

Authors: Zhenya Huang, Qi Liu, Enhong Chen, Hongke Zhao, Mingyong Gao, Si Wei, Yu Su, Guoping Hu

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on a real-world dataset not only show the effectiveness of TACNN, but also give interpretable insights to track the attention information for questions.
Researcher Affiliation Collaboration School of Computer Science and Technology, University of Science and Technology of China {huangzhy, zhhk}@mail.ustc.edu.cn, {qiliuql, cheneh}@ustc.edu.cn i FLYTEK Research, {mygao2, siwei, gphu}@iflytek.com School of Computer Science and Technology, Anhui University, yusu@iflytek.com
Pseudocode No The paper describes the model architecture and mathematical formulations, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements about providing open-source code for the described methodology, nor does it include links to a code repository.
Open Datasets No The experimental dataset supplied by IFLYTEK is collected from real-world standard tests for READING problems, which contains nearly 3 million test logs of thousands of Chinese senior high schools from the year 2014 to 2016.
Dataset Splits No To observe how the models behave at different data sparsity, we randomly select 60%, 40%, 20%, 10% of standard tests as testing sets, and the rests as training sets, respectively. (No explicit mention of validation set splits.)
Hardware Specification Yes Both TACNN and baselines are all implemented by Theano (Bergstra et al. 2010) and all experiments are run on a Tesla K20m GPU.
Software Dependencies No The paper mentions software like Theano and word2vec but does not provide specific version numbers for these or other software dependencies required for replication.
Experiment Setup Yes In TACNN, we set the maximum length M (N) of sentences (words) in documents (sentences) as 25 (40) (zero padded when necessary) according to our observation in Figure 5, i.e., 95% documents (sentences) contains less than 25 (40) sentences (words). Four layers of convolution (three wide convolutions, one narrow convolution) and max-pooling are employed for the Sentence CNN Layer to accommodate the sentence length N, where the numbers of the feature maps for four convolutions are (200, 400, 600, 600) respectively. Also, we set the kernel size k as 3 for all convolution layers and the pooling window p as (3, 3, 2, 1) for each max pooling, respectively. ... we set mini batches as 32 for training and we also use dropout (with probability 0.2) in order to prevent overfitting.