Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Synthesizing Argumentation Frameworks from Examples
Authors: Andreas Niskanen, Johannes P. Wallner, Matti Järvisalo
JAIR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Going beyond defining the AF synthesis problem, we study both theoretical and practical aspects of the problem. In particular, we ... (iv) empirically evaluate our algorithms on different forms of AF synthesis instances |
| Researcher Affiliation | Academia | Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Finland; Institute of Logic and Computation, TU Wien, Austria |
| Pseudocode | Yes | Algorithm 1 CEGAR-based AF synthesis on input (A, E+, E , σ = prf ). |
| Open Source Code | Yes | The implementations and benchmarks used in the evaluation are available online at http://www.cs.helsinki.fi/group/coreo/afsynth/. |
| Open Datasets | Yes | The first set of benchmarks was generated based on the benchmark AFs used in the ICCMA 15 competition (Thimm et al., 2016) |
| Dataset Splits | Yes | For each AF, we picked uniformly at random 5 positive examples from the set of extensions. To obtain negative examples, we selected 10, 20, . . . , 150 subsets of S SE+ uniformly at random... The second set of benchmarks was generated using the following random model. We picked 5, 10, . . . , 80 positive examples from a set of 100 arguments uniformly at random with probability p+ arg = 0.25. Then |E-| = 20, 40, . . . , 200 negative examples were sampled from the set A = S SE+, and each argument was included with probability p arg = P e E+ |Se|/|E+| | S SE+| . Again, each example was assigned as weight a random integer from the interval [1, 10]. For each choice of parameters, this procedure was repeated 10 times to obtain a representative set of benchmarks. The instances for preferred semantics were generated following the same random model, using |A| = 20, |E+| = 5, 10, 15, 20 and |E-| = 10, 20, 30, 40, 50 as parameters. |
| Hardware Specification | Yes | The experiments were run on 2.83-GHz Intel Xeon E5440 quad-core machines with 32-GB memory and Debian GNU/Linux 8 using a per-instance timeout of 900 seconds. |
| Software Dependencies | Yes | For the experiments, we used a variety of state-of-the-art Max SAT solvers...: Max HS (version 2.9.0)... Maxino (version k16)... MSCG (version 2014)... Open-WBO (version 1.3.1)... and WPM3 (version 2015.co)... As the ... ASP solver, we used Clingo (version 5.2.1)... we used the SATIP hybrid Max SAT solver LMHS (Max SAT evaluation 2016 version)... Furthermore, we used Mini SAT... (version 2.2.0) as the SAT solver within the CEGAR approach. |
| Experiment Setup | No | The paper describes how benchmark instances were generated and evaluated, but it does not provide specific hyperparameters for any model training or system-level configuration details beyond the hardware and software used. |