Personalized Mathematical Word Problem Generation

Authors: Oleksandr Polozov, Eleanor O'Rourke, Adam M. Smith, Luke Zettlemoyer, Sumit Gulwani, Zoran Popović

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We report an evaluation of generated problems by comparing human judgements with textbook problems ( 6). Our problems have slightly more artificial language, but they are generally comprehensible, and as solvable as the textbook problems. User Study We prepared an ontology of 100-200 types, relations, and tropes in three literary settings: Fantasy, Science Fiction, School of Wizardry. This one-time initial setup of the system took about 1-2 person-months. From it, we randomly generated 25 problems in the domains of age, counting, and trading, with the solutions requiring 2-4 primitive arithmetic operations. We sampled the problems with sufficient linguistic variability to evaluate the overall text quality. Although the ASP solving has exponential complexity, every problem was generated in less than 60 s, which is a feasible time limit for realistic problems within our range of interests. We selected 25 textbook problems from the Singapore Math curriculum [Publications, 2009] with the equivalent distribution of complexity (solution lengths), and conducted two studies using Mechanical Turk. Study A assessed language aspects of the problems. It asked the subjects 4 questions (shown in Figure 3) on a forced-choice Likert scale. Study B assessed mathematical applicability of the problems. It asked the subjects to solve a given problem, and measured solving time and correctness. For both studies, each problem was presented to 20 different native English speakers (1000 total).
Researcher Affiliation Collaboration Oleksandr Polozov University of Washington polozov@cs.washington.edu Eleanor O Rourke University of Washington eorourke@cs.washington.edu Adam M. Smith University of Washington amsmith@cs.washington.edu Luke Zettlemoyer University of Washington lsz@cs.washington.edu Sumit Gulwani Microsoft Research Redmond sumitg@microsoft.com Zoran Popovi c University of Washington zoran@cs.washington.edu
Pseudocode No The paper includes code snippets to illustrate concepts (e.g., ASP syntax for requirements and logic generation), but it does not provide structured pseudocode blocks or formally labeled algorithms.
Open Source Code No The paper does not provide an explicit statement or a link indicating that the source code for their methodology is publicly available.
Open Datasets Yes We selected 25 textbook problems from the Singapore Math curriculum [Publications, 2009] with the equivalent distribution of complexity (solution lengths)...
Dataset Splits No The paper describes the setup of a user study to evaluate the generated problems, but it does not specify dataset splits (e.g., training, validation, test percentages or counts) for a machine learning model's training or validation process.
Hardware Specification No The paper mentions that "every problem was generated in less than 60 s", but it does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments or generating problems.
Software Dependencies No The paper mentions using "answer-set programming (ASP)" and "state-of-the-art ASP solvers", but it does not provide specific version numbers for these or any other software dependencies needed for reproducibility.
Experiment Setup No The paper describes the setup of the user study (e.g., number of problems, subjects, Likert scale), but it does not provide specific experimental setup details such as hyperparameters, training configurations, or system-level settings for the generative model itself (e.g., learning rates, batch sizes, optimizer details if applicable).