Solving Math Word Problems with Teacher Supervision

Authors: Zhenwen Liang, Xiangliang Zhang

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on two benchmark MWPs datasets verified that our proposed solution outperforms the state-of-the-art models.
Researcher Affiliation Academia King Abdullah University of Science and Technology (KAUST), Saudi Arabia
Pseudocode Yes Algorithm 1 Negative Answer Generation
Open Source Code Yes Our code in Python with Pytorch framework can be found at https://github.com/derderking/MWP-teacher.
Open Datasets Yes We take two widely used benchmark datasets for the experimental evaluation, Math23K and MAWPS. Math23k [Wang et al., 2017] is one of the most commonly used dataset for MWP solver evaluation. It has 23161 math word problems. MAWPS [Koncel-Kedziorski et al., 2016] is a relatively small dataset which only contains 2373 problems.
Dataset Splits Yes There are also others using a 5-fold cross validation to measure the performance of their solvers. In our experiments, we report the accuracy for both settings. We also perform 5-fold cross validation on this dataset.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU specifications, or memory.
Software Dependencies No The paper mentions "Our code in Python with Pytorch framework" but does not specify version numbers for Python, Pytorch, or any other libraries used, which is required for reproducibility.
Experiment Setup Yes The embedding dimension of Wembed is set to 128. The latent feature dimension d is set to 512. Our models are trained for 120 epochs. The training was conducted in two stages. We use Adam optimizer [Kingma and Ba, 2014] with initial learning rate 0.001, which is halved every 30 epochs. Dropout [Hinton et al., 2012] on embedding matrix of probability 0.5 is employed to prevent overfitting. During testing, we use beam search of size 8 to generate the math expression sequence. The weight α in Eq. (7) is set to 0.1, by its sensitivity analysis.