reproducibilityindex.ai

Solving Math Word Problems with Teacher Supervision

Authors: Zhenwen Liang, Xiangliang Zhang

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on two benchmark MWPs datasets veriﬁed that our proposed solution outperforms the state-of-the-art models.
Researcher Affiliation	Academia	King Abdullah University of Science and Technology (KAUST), Saudi Arabia
Pseudocode	Yes	Algorithm 1 Negative Answer Generation
Open Source Code	Yes	Our code in Python with Pytorch framework can be found at https://github.com/derderking/MWP-teacher.
Open Datasets	Yes	We take two widely used benchmark datasets for the experimental evaluation, Math23K and MAWPS. Math23k [Wang et al., 2017] is one of the most commonly used dataset for MWP solver evaluation. It has 23161 math word problems. MAWPS [Koncel-Kedziorski et al., 2016] is a relatively small dataset which only contains 2373 problems.
Dataset Splits	Yes	There are also others using a 5-fold cross validation to measure the performance of their solvers. In our experiments, we report the accuracy for both settings. We also perform 5-fold cross validation on this dataset.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU specifications, or memory.
Software Dependencies	No	The paper mentions "Our code in Python with Pytorch framework" but does not specify version numbers for Python, Pytorch, or any other libraries used, which is required for reproducibility.
Experiment Setup	Yes	The embedding dimension of Wembed is set to 128. The latent feature dimension d is set to 512. Our models are trained for 120 epochs. The training was conducted in two stages. We use Adam optimizer [Kingma and Ba, 2014] with initial learning rate 0.001, which is halved every 30 epochs. Dropout [Hinton et al., 2012] on embedding matrix of probability 0.5 is employed to prevent overﬁtting. During testing, we use beam search of size 8 to generate the math expression sequence. The weight α in Eq. (7) is set to 0.1, by its sensitivity analysis.