Teaching Machines to Ask Questions
Authors: Kaichun Yao, Libo Zhang, Tiejian Luo, Lili Tao, Yanjun Wu
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our model is trained and evaluated on a question-answering dataset SQu AD, and the experimental results shown the proposed model is able to generate diverse and readable questions with the specific attribute. |
| Researcher Affiliation | Academia | 1 University of the Chinese Academy of Sciences 2 Institute of Software Chinese Academy of Sciences 3 University of the West of England |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not explicitly state that open-source code for the described methodology is provided, nor does it provide a link to such code. |
| Open Datasets | Yes | We conduct our experiments on the SQu AD dataset [Rajpurkar et al., 2016], which is used for machine reading comprehension and consists of more than 100,000 questions posed by crowd workers on 536 high-Page Rank Wikipedia articles. |
| Dataset Splits | Yes | After pre-processing, the extracted training, development and test sets contain 83,889, 5,168 and 5,000 triples respectively. |
| Hardware Specification | No | No specific hardware details (e.g., CPU/GPU models, memory) used for running experiments are mentioned in the paper. |
| Software Dependencies | No | The paper mentions 'Stanford Core NLP' and 'glove.840B.300d pre-trained embeddings' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We set the dimension of word embedding to 300 and use the glove.840B.300d pre-trained embeddings [Pennington et al., 2014] for initialization. The LSTM hidden unit size is set to 300 and the number of layers of LSTMs is set to 1 in both the encoder and the decoder. We update the model parameters using stochastic gradient descent with mini-batch size of 64. The learning rate of generator G and discriminator D is set to 0.001, 0.0002, respectively. We clip the gradient when its norm exceeds 5. The scaling factors α and β are set to 0.6 and 0.5. The latent z space size is set to 200. During decoding, we do beam search with a beam size of 3. |