Solving Math Word Problems with Teacher Supervision
Authors: Zhenwen Liang, Xiangliang Zhang
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on two benchmark MWPs datasets verified that our proposed solution outperforms the state-of-the-art models. |
| Researcher Affiliation | Academia | King Abdullah University of Science and Technology (KAUST), Saudi Arabia |
| Pseudocode | Yes | Algorithm 1 Negative Answer Generation |
| Open Source Code | Yes | Our code in Python with Pytorch framework can be found at https://github.com/derderking/MWP-teacher. |
| Open Datasets | Yes | We take two widely used benchmark datasets for the experimental evaluation, Math23K and MAWPS. Math23k [Wang et al., 2017] is one of the most commonly used dataset for MWP solver evaluation. It has 23161 math word problems. MAWPS [Koncel-Kedziorski et al., 2016] is a relatively small dataset which only contains 2373 problems. |
| Dataset Splits | Yes | There are also others using a 5-fold cross validation to measure the performance of their solvers. In our experiments, we report the accuracy for both settings. We also perform 5-fold cross validation on this dataset. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU specifications, or memory. |
| Software Dependencies | No | The paper mentions "Our code in Python with Pytorch framework" but does not specify version numbers for Python, Pytorch, or any other libraries used, which is required for reproducibility. |
| Experiment Setup | Yes | The embedding dimension of Wembed is set to 128. The latent feature dimension d is set to 512. Our models are trained for 120 epochs. The training was conducted in two stages. We use Adam optimizer [Kingma and Ba, 2014] with initial learning rate 0.001, which is halved every 30 epochs. Dropout [Hinton et al., 2012] on embedding matrix of probability 0.5 is employed to prevent overfitting. During testing, we use beam search of size 8 to generate the math expression sequence. The weight α in Eq. (7) is set to 0.1, by its sensitivity analysis. |