Automatic Generation of Headlines for Online Math Questions

Authors: Ke Yuan, Dafang He, Zhuoren Jiang, Liangcai Gao, Zhi Tang, C. Lee Giles9490-9497

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that our model (Math Sum) significantly outperforms state-of-the-art models for both the EXEQ-300k and OFEQ-10k datasets.
Researcher Affiliation Academia 1Wangxuan Institute of Computer Technology, Peking University, Beijing, 100080, China 2The Pennsylvania State University, University Park, PA 16802, USA 3School of Data and Computer Science, Sun Yat-sen University, Guangzhou, 510006, China
Pseudocode No The paper describes the model architecture and mathematical formulations but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes 4https://github.com/yuankepku/Math Sum
Open Datasets Yes For evaluation, we collect and make available two sets of real-world detailed math questions along with human-written math headlines, namely EXEQ-300k and OFEQ-10k. [...] 4https://github.com/yuankepku/Math Sum
Dataset Splits Yes We randomly split EXEQ-300k into training (90%, 261,341), validation (5%, 14,564), and testing (5%, 14,574) sets. In order to get enough testing samples, we split OFEQ10k in a 80% training (10,301), 10% validation (1,124), and 10% testing (1,123) proportions.
Hardware Specification Yes We implement our model in Py Torch and train on a single Titan X GPU.
Software Dependencies No We implement our model in Py Torch and train on a single Titan X GPU. The toolkit Stanford Core NLP6 and LATEX tokenizer in im2mark7 are used to tokenize separately the text and equations in questions and headlines. For Text Rank, we use the implementation in summanlp, https://github.com/summanlp/textrank. We use the implementation of Open NMT, https://github.com/Open NMT/Open NMT-py. The BLEU and METEOR scores are calculated by using nlg-eval11 package, and ROUGE scores are based on rouge-baselines12 package. While several software components are mentioned, specific version numbers are not provided for any of them.
Experiment Setup Yes For our experiments, the dimensionality of the word embedding is 128 and the number of hidden states for LSTM units for both encoder and decoder is 512. The multi-head attention block contains 4 heads and 256-dimensional hidden states for the feed-forward part. The model is trained using Ada Grad (Duchi, Hazan, and Singer 2011) with a learning rate of 0.2, an initial accumulator value of 0.1, and a batch size of 16. Also, we set the dropout rate as 0.3. The vocabulary size of the question and headline are both 50,000. At test time, we decode the math headline using beam search with beam size of 3. We set the minimum length as 20 tokens on EXEQ-300k and 15 tokens on OFEQ-10k.