reproducibilityindex.ai

Personalized Mathematical Word Problem Generation

Authors: Oleksandr Polozov, Eleanor O'Rourke, Adam M. Smith, Luke Zettlemoyer, Sumit Gulwani, Zoran Popović

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We report an evaluation of generated problems by comparing human judgements with textbook problems ( 6). Our problems have slightly more artiﬁcial language, but they are generally comprehensible, and as solvable as the textbook problems. User Study We prepared an ontology of 100-200 types, relations, and tropes in three literary settings: Fantasy, Science Fiction, School of Wizardry. This one-time initial setup of the system took about 1-2 person-months. From it, we randomly generated 25 problems in the domains of age, counting, and trading, with the solutions requiring 2-4 primitive arithmetic operations. We sampled the problems with sufﬁcient linguistic variability to evaluate the overall text quality. Although the ASP solving has exponential complexity, every problem was generated in less than 60 s, which is a feasible time limit for realistic problems within our range of interests. We selected 25 textbook problems from the Singapore Math curriculum [Publications, 2009] with the equivalent distribution of complexity (solution lengths), and conducted two studies using Mechanical Turk. Study A assessed language aspects of the problems. It asked the subjects 4 questions (shown in Figure 3) on a forced-choice Likert scale. Study B assessed mathematical applicability of the problems. It asked the subjects to solve a given problem, and measured solving time and correctness. For both studies, each problem was presented to 20 different native English speakers (1000 total).
Researcher Affiliation	Collaboration	Oleksandr Polozov University of Washington polozov@cs.washington.edu Eleanor O Rourke University of Washington eorourke@cs.washington.edu Adam M. Smith University of Washington amsmith@cs.washington.edu Luke Zettlemoyer University of Washington lsz@cs.washington.edu Sumit Gulwani Microsoft Research Redmond sumitg@microsoft.com Zoran Popovi c University of Washington zoran@cs.washington.edu
Pseudocode	No	The paper includes code snippets to illustrate concepts (e.g., ASP syntax for requirements and logic generation), but it does not provide structured pseudocode blocks or formally labeled algorithms.
Open Source Code	No	The paper does not provide an explicit statement or a link indicating that the source code for their methodology is publicly available.
Open Datasets	Yes	We selected 25 textbook problems from the Singapore Math curriculum [Publications, 2009] with the equivalent distribution of complexity (solution lengths)...
Dataset Splits	No	The paper describes the setup of a user study to evaluate the generated problems, but it does not specify dataset splits (e.g., training, validation, test percentages or counts) for a machine learning model's training or validation process.
Hardware Specification	No	The paper mentions that "every problem was generated in less than 60 s", but it does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments or generating problems.
Software Dependencies	No	The paper mentions using "answer-set programming (ASP)" and "state-of-the-art ASP solvers", but it does not provide specific version numbers for these or any other software dependencies needed for reproducibility.
Experiment Setup	No	The paper describes the setup of the user study (e.g., number of problems, subjects, Likert scale), but it does not provide specific experimental setup details such as hyperparameters, training configurations, or system-level settings for the generative model itself (e.g., learning rates, batch sizes, optimizer details if applicable).