reproducibilityindex.ai

Automatic Generation of Headlines for Online Math Questions

Authors: Ke Yuan, Dafang He, Zhuoren Jiang, Liangcai Gao, Zhi Tang, C. Lee Giles9490-9497

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that our model (Math Sum) significantly outperforms state-of-the-art models for both the EXEQ-300k and OFEQ-10k datasets.
Researcher Affiliation	Academia	1Wangxuan Institute of Computer Technology, Peking University, Beijing, 100080, China 2The Pennsylvania State University, University Park, PA 16802, USA 3School of Data and Computer Science, Sun Yat-sen University, Guangzhou, 510006, China
Pseudocode	No	The paper describes the model architecture and mathematical formulations but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	4https://github.com/yuankepku/Math Sum
Open Datasets	Yes	For evaluation, we collect and make available two sets of real-world detailed math questions along with human-written math headlines, namely EXEQ-300k and OFEQ-10k. [...] 4https://github.com/yuankepku/Math Sum
Dataset Splits	Yes	We randomly split EXEQ-300k into training (90%, 261,341), validation (5%, 14,564), and testing (5%, 14,574) sets. In order to get enough testing samples, we split OFEQ10k in a 80% training (10,301), 10% validation (1,124), and 10% testing (1,123) proportions.
Hardware Specification	Yes	We implement our model in Py Torch and train on a single Titan X GPU.
Software Dependencies	No	We implement our model in Py Torch and train on a single Titan X GPU. The toolkit Stanford Core NLP6 and LATEX tokenizer in im2mark7 are used to tokenize separately the text and equations in questions and headlines. For Text Rank, we use the implementation in summanlp, https://github.com/summanlp/textrank. We use the implementation of Open NMT, https://github.com/Open NMT/Open NMT-py. The BLEU and METEOR scores are calculated by using nlg-eval11 package, and ROUGE scores are based on rouge-baselines12 package. While several software components are mentioned, specific version numbers are not provided for any of them.
Experiment Setup	Yes	For our experiments, the dimensionality of the word embedding is 128 and the number of hidden states for LSTM units for both encoder and decoder is 512. The multi-head attention block contains 4 heads and 256-dimensional hidden states for the feed-forward part. The model is trained using Ada Grad (Duchi, Hazan, and Singer 2011) with a learning rate of 0.2, an initial accumulator value of 0.1, and a batch size of 16. Also, we set the dropout rate as 0.3. The vocabulary size of the question and headline are both 50,000. At test time, we decode the math headline using beam search with beam size of 3. We set the minimum length as 20 tokens on EXEQ-300k and 15 tokens on OFEQ-10k.