Solving Probability Problems in Natural Language
Authors: Anton Dries, Angelika Kimmig, Jesse Davis, Vaishak Belle, Luc de Raedt
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On a dataset of 2160 probability problems, our solver is able to correctly answer 97.5% of the questions given a correct model. On the end-to-end evaluation, we are able to answer 12.5% of the questions (or 31.1% if we exclude examples not supported by design). |
| Researcher Affiliation | Academia | Department of Computer Science, KU Leuven, Belgium University of Edinburgh, UK |
| Pseudocode | No | The paper describes its methods in prose but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states 'An online version of our system is available at https:// dtai.cs.kuleuven.be/problog/natural_language.', which refers to an online system or demo, not explicitly the source code for the methodology described in the paper. |
| Open Datasets | No | The paper states: 'we hired three students to collect and label probability problems from textbooks and online sources. This has resulted in 2376 probability-related problem descriptions. For 2160 (90.9%) of these examples, we could derive a formal model'. While a dataset was created and used, no concrete access information (link, DOI, citation for public release) is provided for this dataset. |
| Dataset Splits | No | The paper mentions 'trained on 200 randomly selected examples' for the NLP classifier, but it does not provide specific training/test/validation splits (e.g., percentages or counts) for the main dataset of 2160 probability problems used in the overall evaluation. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions software like 'Prob Log', 'Stanford Core NLP', and 'scikit-learn MLPClassifier', but it does not specify any version numbers for these software dependencies. |
| Experiment Setup | Yes | Our solver could solve 2106 correctly within a time limit of 60 seconds per task. [...] This classification is based on a neural-network classifier (using scikit-learn s2 MLPClassifier) trained on 200 randomly selected examples. As features, we use 45 features that describe the structure of the parse tree around the number (see Table 1 for a summary of these features). |