Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Leveraging Qualitative Reasoning to Improve SFL
Authors: Alexandre Perez, Rui Abreu
IJCAI 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluation shows that augmenting SFL with qualitative components can improve diagnostic accuracy in 54% of the considered real-world subjects. |
| Researcher Affiliation | Academia | 1 University of Porto, Portugal 2 HASLab, INESC-TEC 3 IST, University of Lisbon, Portugal 4 INESC-ID |
| Pseudocode | No | The paper describes methods and steps in prose but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | All scripts used to run this experiment, as well as the gathered data, are available at https://github.com/aperez/q-sfl-experiments. |
| Open Datasets | Yes | We have sourced experimental subjects from the Defects4J3 (D4J) database. D4J is a catalog of 395 real, reproducible software bugs from 6 open-source projects namely JFree Chart, Google Closure compiler, Apache Commons Lang, Apache Commons Math, Mockito, and Joda-Time. For each bug, a developer-written, fault-revealing test suite is made available. 3Defects4J 1.1.0 is available at https://github.com/rjust/defects4j (accessed May 2018). |
| Dataset Splits | No | Hence, we do not break our data into training and test sets, as is customary in prediction scenarios. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running the experiments. |
| Software Dependencies | No | We chose popular classification algorithms [Han et al., 2011] implemented in the Scikit-learn package. X-means, as implemented in the pyclustering package, was selected as it can automatically decide the optimal number of clusters to use [Pelleg and Moore, 2000]. |
| Experiment Setup | Yes | Using the recorded argument and return value data, we create multiple (automated) partitioning models resulting in several Q-SFL variants. A static partitioning variant using automated sign partitioning based on the variable s type, as described in Section 3.2, was considered. For dynamic partitioning, several clustering and classification algorithms4 were considered: k-NN, linear classification, logistic regression, decision trees, random forest, and x-means clustering Test outcomes are used as the class labels in the case of supervised models. |