On the (In)Tractability of Reinforcement Learning for LTL Objectives
Authors: Cambridge Yang, Michael L. Littman, Michael Carbin
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments with current reinforcement-learning algorithms for LTL objectives that provide empirical support for our theoretical result.In Section 5, the paper explicitly states: "This section empirically demonstrates our main result, the forward direction of Theorem 1." |
| Researcher Affiliation | Academia | Cambridge Yang1 , Michael L. Littman2 , Michael Carbin1 1MIT CSAIL 2Brown University camyang@csail.mit.edu, mlittman@cs.brown.edu, mcarbin@csail.mit.edu |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. It describes algorithms conceptually and refers to external works for their implementation details. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code for the methodology or a link to a code repository. |
| Open Datasets | No | The paper uses 'counterexample MDPs' (Figure 2, Figure 3) that are theoretically constructed environments, and also mentions adapting a case study from another paper ([Sadigh et al., 2014]). These are not datasets in the traditional sense, and no concrete access information (link, DOI, etc.) is provided for any 'dataset' that would be used for training. For example, Section 5.1 says 'The first pair is the formula F h and the counterexample MDP as shown in Figure 2.' |
| Dataset Splits | No | The paper does not specify dataset splits (e.g., train/validation/test percentages or sample counts). The empirical evaluation is performed on constructed MDPs and LTL objectives, not on pre-existing datasets with defined splits. For instance, the paper discusses varying 'p' and 'N' for its empirical evaluation, but this does not constitute dataset splitting information. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU specifications, or memory. |
| Software Dependencies | No | The paper mentions using 'various recent reinforcement-learning algorithms for LTL objectives [Sadigh et al., 2014; Hahn et al., 2019; Bozkurt et al., 2020]' in Section 5.1. However, it does not specify the versions of any software libraries, programming languages, or other dependencies used to implement or run these algorithms, which is necessary for reproducibility. |
| Experiment Setup | Yes | Section 5.1 'Methodology' states: 'For each algorithm and each pair of values of p and N, we fix ϵ = 0.1 and repeatedly run the algorithm to obtain a Monte Carlo estimation of the LTL-PAC probability... For the first LTL-MDP pair, we vary p by a geometric progression from 10 1 to 10 3 in 5 steps. We vary N by a geometric progression from 101 to 105 in 21 steps.' These provide specific details about the experimental parameters and ranges tested. |