Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Synthesis of Geometry Proof Problems
Authors: Chris Alvin, Sumit Gulwani, Rupak Majumdar, Supratik Mukhopadhyay
AAAI 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results indicate that our problem generation algorithm can effectively generate proof problems in elementary geometry. On a corpus of 110 figures taken from popular geometry textbooks, our system generated an average of about 443 problems per figure in an average time of 4.7 seconds per figure. |
| Researcher Affiliation | Collaboration | Chris Alvin Louisiana State University EMAIL Sumit Gulwani Microsoft Research, Redmond EMAIL Rupak Majumdar MPI-SWS EMAIL Supratik Mukhopadhyay Louisiana State University EMAIL |
| Pseudocode | Yes | Algorithm 1: Algorithm All Minimal Sets and Algorithm 2: Algorithm Gen Problem |
| Open Source Code | No | No concrete access to source code for the methodology described in this paper was found. |
| Open Datasets | No | We ran our problem generation algorithm on a set of 110 figures taken from standard mathematics textbooks in India (Sinclair and Dikshit 2006b; 2006a) as well as textbooks and workbooks popular in the United States (Boyd 2006; Larson et al. 2007; Mc Dougal 2007; Jurgensen, Brown, and Jurgensen 1988). |
| Dataset Splits | No | The paper uses a corpus of 110 figures, but does not provide any information about train/validation/test splits for these figures. |
| Hardware Specification | Yes | We ran our experiments on a laptop with Intel Core i5-2520M CPU at 2.5GHz with 8 GB RAM on 64-bit Windows 7 operating system. |
| Software Dependencies | No | The paper mentions '64-bit Windows 7 operating system' but does not specify any other ancillary software components with version numbers required to replicate the experiment. |
| Experiment Setup | Yes | We modified Gen Problem to only generate problems where |G| 2. For each (Fig, Axm) pair, we fixed I to be the minimal set of assumptions that corresponded to the original textbook problem description corresponding to the figure F. |