Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Bayesian Inference of Linear Temporal Logic Specifications for Contrastive Explanations
Authors: Joseph Kim, Christian Muise, Ankit Shah, Shubham Agarwal, Julie Shah
IJCAI 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated the effectiveness of Bayes LTL for inferring contrastive explanations from sets of traces generated from International Planning Competition (IPC) planning domains... (Section 5.1). Table 2 shows the inference results on the tested domains and on problem instances of varying complexity. (Section 6). |
| Researcher Affiliation | Collaboration | Joseph Kim1 , Christian Muise2 , Ankit Shah1 , Shubham Agarwal2 and Julie Shah1 1MIT Computer Science and Artificial Intelligence Laboratory 2MIT-IBM Watson AI Lab EMAIL, EMAIL |
| Pseudocode | No | The paper describes algorithmic steps in prose (e.g., 'MH sampling requires a user-defined proposal function F(ϕ |ϕ) that samples a new candidate ϕ given the current ϕ.'). It does not present structured pseudocode or an algorithm block. |
| Open Source Code | No | The paper provides a link to a baseline's code: 'https://github.com/gergia/samples2LTL (commit: 69f692a).' There is no statement or link for the authors' own code for Bayes LTL. |
| Open Datasets | Yes | We evaluated the effectiveness of Bayes LTL for inferring contrastive explanations from sets of traces generated from International Planning Competition (IPC) planning domains [Long and Fox, 2003]. |
| Dataset Splits | Yes | We collected twenty traces for each set. (Section 5.1). A total of 24 instances (i.e. traces) of LFEs were separated into positive and negative sets by a subject matter expert. The detail of the input was as follows: |πA|=16, |πB|=8, |V |=15, and the average length of traces involved 11 time steps. (Section 6). |
| Hardware Specification | Yes | All experiments were conducted on Debian machines with Intel Xeon E3-1200 CPUs at 1.8 GHz using up to 4 GB of RAM. |
| Software Dependencies | No | The paper mentions 'Debian machines' but does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks, or solvers). |
| Experiment Setup | Yes | α = β = 0.01, to put equal importance of positive and negative sets, λ = 0.7 to penalize ϕ for every additional conjunct, and ϵ = 0.2 to apply ϵ-greedy search in the the proposal function. We ran the MH sampler with num MH = 2, 000 iterations with the first 300 used as a burn-in period. |