reproducibilityindex.ai

Template-Based Math Word Problem Solvers with Recursive Neural Networks

Authors: Lei Wang, Dongxiang Zhang, Jipeng Zhang, Xing Xu, Lianli Gao, Bing Tian Dai, Heng Tao Shen7144-7151

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results clearly establish the superiority of our new framework as we improve the accuracy by a wide margin in two of the largest datasets, i.e., from 58.1% to 66.9% in Math23K and from 62.8% to 66.8% in MAWPS. We conduct experiments on two of the largest datasets for arithmetic word problems, in which Math23K contains 23, 164 math problems and MAWPS contains 2, 373 problems.
Researcher Affiliation	Academia	Lei Wang,1 Dongxiang Zhang,1,2 Jipeng Zhang,1 Xing Xu,1,2 Lianli Gao,1 Bing Tian Dai,3 Heng Tao Shen1 1Center for Future Media and School of Computer Science & Engineering, UESTC 2Afanti Research 3School of Information Systems, Singapore Management University {demolwang,zhangjipeng20}@std.uestc.edu.cn, {zhangdo,xing.xu,lianli.gao}@uestc.edu.cn btdai@smu.edu.sg, shenhengtao@hotmail.com
Pseudocode	No	The paper describes the model architecture and processes but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	5. We release the source code of our model in Github1. 1https://github.com/uestc-db/T-RNN
Open Datasets	Yes	MAWPS (Koncel-Kedziorski et al. 2016) is another testbed for arithmetic word problems with one unknown variable in the question... combines the published word problem datasets used in (Hosseini et al. 2014; Kushman et al. 2014; Koncel-Kedziorski et al. 2015; Roy and Roth 2015). Math23K (Wang, Liu, and Shi 2017). The dataset contains Chinese math word problems for elementary school students and is crawled from multiple online education websites.
Dataset Splits	Yes	Since Math23K has split the problems into training and test datasets when it was published, we simply follow its original setup. For MAWPS, we use 5-fold cross validation.
Hardware Specification	Yes	All the experiments were conducted on the same server, with 4 CPU cores (Intel Xeon CPU E5-2650 with 2.30GHz) and 32GB memory.
Software Dependencies	No	The paper describes the neural network architectures (e.g., Bi-LSTM, LSTM) and optimizers (Adam, SGD) used, along with their parameters (e.g., learning rate, hidden units), but it does not specify the software libraries or frameworks (like TensorFlow, PyTorch) with version numbers that were used for implementation.
Experiment Setup	Yes	In the template prediction module, we use a pre-trained word embedding with 128 units, a two-layer Bi-LSTM with 256 hidden units as encoder, a two-layer LSTM with 512 hidden units as decoder. As to the optimizer, we use Adam with learning rate set to 1e 3, β1 = 0.9 and β2 = 0.99. In the answer generation module, we use a embedding layer with 100 units, a two-layer Bi-LSTM with 160 hidden units. SGD with learning rate 0.01 and momentum factor 0.9 is used to optimize this module. In both components, the number of epochs, mini-batch size and dropout rate are set 100, 32 and 0.5 respectively.