Autoformalizing Euclidean Geometry

Authors: Logan Murphy, Kaiyu Yang, Jialiang Sun, Zhaoyu Li, Anima Anandkumar, Xujie Si

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments with GPT-4 and GPT-4V show the capability and limitations of state-of-the-art LLMs on autoformalizing geometry problems.
Researcher Affiliation Academia Logan Murphy 1 * Kaiyu Yang 2 * Jialiang Sun 1 Zhaoyu Li 1 Anima Anandkumar 2 Xujie Si 1 1University of Toronto 2Caltech.
Pseudocode Yes We describe our variant of the formal system E in Lean-like pseudocode, including the complete list of axioms.
Open Source Code Yes The data and code are available at https://github. com/loganrjmurphy/Lean Euclid.
Open Datasets Yes We construct Lean Euclid, an autoformalization benchmark consisting of problems from Euclid s Elements and the Uni Geo dataset formalized in the Lean proof assistant. ... Data examples in Lean Euclid are manually formalized into Lean from Euclid s Elements (Heiberg, 2007) and the Uni Geo dataset (Chen et al., 2022).
Dataset Splits Yes To implement few-shot learning, we randomly selected five propositions from Euclid s Elements and five problems from each category in the Uni Geo dataset, serving as in-context learning examples. We then use E3 to automatically evaluate the results of each round. To see how well E3 correlates with human evaluation, we manually evaluate a sample of formalized theorems from Elements to identify any false negatives/positives.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) are mentioned for the experiments, only the LLM models used (GPT-4, GPT-4V).
Software Dependencies Yes Experiments were conducted in January 2024 using gpt-4-1106-preview and gpt-4-1106-vision-preview.
Experiment Setup Yes To implement few-shot learning, we randomly selected five propositions from Euclid s Elements and five problems from each category in the Uni Geo dataset, serving as in-context learning examples. ... Our prompt template contains an overview of Lean Euclid syntax, some examples of wellformed formulas, and various guidelines for the task. For few-shot learning, we also include k examples of informal-formal pairs as in-context demonstrations. The complete prompt template is in Appendix D.