MathDQN: Solving Arithmetic Word Problems via Deep Reinforcement Learning

Authors: Lei Wang, Dongxiang Zhang, Lianli Gao, Jingkuan Song, Long Guo, Heng Tao Shen

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results validate our superiority over state-ofthe-art methods. Our Math DQN yields remarkable improvement on most of datasets and boosts the average precision among all the benchmark datasets by 15%. In this section, we evaluate the proposed DQN framework on three publicly available datasets of arithmetic word problems. We evaluate both accuracy and efficiency by comparing with state-of-the-art methods.
Researcher Affiliation Academia Lei Wang,# Dongxiang Zhang,# Lianli Gao,# Jingkuan Song,# Long Guo, Heng Tao Shen# # Center for Future Media and School of Computer Science & Engineering, UESTC, China Key Lab of High Confidence Software Technologies (MOE), Peking University, China
Pseudocode Yes Algorithm 1: Training Procedure via Deep QNetwork
Open Source Code Yes We make our implementation code available in Github1. 1https://github.com/uestc-db/DQN Word Problem Solver
Open Datasets Yes In this section, we evaluate the proposed DQN framework on three publicly available datasets of arithmetic word problems. Instead, we still apply the same set of benchmark datasets on arithmetic math problems as used in the state-of-the-art work (Roy and Roth 2015). 1. AI2 (Hosseini et al. 2014). 2. IL (Roy, Vieira, and Roth 2015). 3. CC (Roy and Roth 2015).
Dataset Splits No The paper does not explicitly mention the use of a distinct 'validation' dataset or how the data was split into training, validation, and test sets. It mentions a 'training dataset' and evaluating on 'benchmark datasets' (test sets), but not validation.
Hardware Specification Yes All the experiments were conducted on the same server, with 4 CPU cores (Intel Xeon CPU E5-2650 with 2:30GHz) and 32GB memory.
Software Dependencies No The paper mentions that "The DQN model was implemented on top of Tensor Flow" but does not provide a specific version number for Tensor Flow or any other software dependencies.
Experiment Setup Yes In our DQN model, we set the size of replay memory D to 15, 000 and the discount factor γ = 0.9. The DQN model was implemented on top of Tensor Flow and the learning rate is set to 0.0001 for RMSProp. To adjust the trade-off between exploration and exploitation, we reduce ϵ in the greedy strategy from 0.5 to 0.01 over 30, 000 epochs. A mini-batch gradient update is executed every one step of expression tree construction and we set the size of mini-batch to 32. The feed-forward neural network contains 2 hidden layers, each with 50 dimensions.