Generalizing Math Word Problem Solvers via Solution Diversification

Authors: Zhenwen Liang, Jipeng Zhang, Lei Wang, Yan Wang, Jie Shao, Xiangliang Zhang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on a benchmark dataset Math23k and a new dataset named Weak12k, and show that our framework improves the performance of various MWP solvers under different settings by generating correct and diverse solutions.
Researcher Affiliation Collaboration 1 University of Notre Dame 2 Hong Kong University of Science and Technology 3 Singapore Management University 4 Tencent AI Lab 5 University of Electronic Science and Technology of China
Pseudocode Yes Algorithm 1: Weak Data Augmentation; Alg. 2.
Open Source Code Yes The code and data can be found in 1. 1https://github.com/LZhenwen/Solution Diversity
Open Datasets Yes We curate and release a novel math word problem (MWP) dataset called Weak12k with 12,117 MWPs. This dataset will be released to the public upon paper acceptance to facilitate future studies like semi-weakly supervised solver development.
Dataset Splits Yes We report the performance of 5-fold cross-validation on it following (Xie and Sun 2019) and (Hong et al. 2021a).
Hardware Specification Yes We use Pytorch to construct the code and the NVIDIA RTX 2080Ti graphic card to train the solvers.
Software Dependencies No The paper states 'We use Pytorch to construct the code' but does not provide specific version numbers for Pytorch or any other software dependencies.
Experiment Setup Yes The dimension of the embedding matrix is 128, and the dimension of all hidden features is 512. We train the model 200 epochs with the Adam optimizer (Kingma and Ba 2014) and the learning rate 0.001, which will be halved every 30 epochs. For the first 100 epochs use ai = si and the remaining epochs use (si + twsi)/2. We update the solution buffer every 5 epochs of parameter learning, to leave sufficient time to train the model solution buffer updates.