On the (In)Tractability of Reinforcement Learning for LTL Objectives

Authors: Cambridge Yang, Michael L. Littman, Michael Carbin

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments with current reinforcement-learning algorithms for LTL objectives that provide empirical support for our theoretical result.In Section 5, the paper explicitly states: "This section empirically demonstrates our main result, the forward direction of Theorem 1."
Researcher Affiliation Academia Cambridge Yang1 , Michael L. Littman2 , Michael Carbin1 1MIT CSAIL 2Brown University camyang@csail.mit.edu, mlittman@cs.brown.edu, mcarbin@csail.mit.edu
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks. It describes algorithms conceptually and refers to external works for their implementation details.
Open Source Code No The paper does not provide an explicit statement about releasing its source code for the methodology or a link to a code repository.
Open Datasets No The paper uses 'counterexample MDPs' (Figure 2, Figure 3) that are theoretically constructed environments, and also mentions adapting a case study from another paper ([Sadigh et al., 2014]). These are not datasets in the traditional sense, and no concrete access information (link, DOI, etc.) is provided for any 'dataset' that would be used for training. For example, Section 5.1 says 'The first pair is the formula F h and the counterexample MDP as shown in Figure 2.'
Dataset Splits No The paper does not specify dataset splits (e.g., train/validation/test percentages or sample counts). The empirical evaluation is performed on constructed MDPs and LTL objectives, not on pre-existing datasets with defined splits. For instance, the paper discusses varying 'p' and 'N' for its empirical evaluation, but this does not constitute dataset splitting information.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU specifications, or memory.
Software Dependencies No The paper mentions using 'various recent reinforcement-learning algorithms for LTL objectives [Sadigh et al., 2014; Hahn et al., 2019; Bozkurt et al., 2020]' in Section 5.1. However, it does not specify the versions of any software libraries, programming languages, or other dependencies used to implement or run these algorithms, which is necessary for reproducibility.
Experiment Setup Yes Section 5.1 'Methodology' states: 'For each algorithm and each pair of values of p and N, we fix ϵ = 0.1 and repeatedly run the algorithm to obtain a Monte Carlo estimation of the LTL-PAC probability... For the first LTL-MDP pair, we vary p by a geometric progression from 10 1 to 10 3 in 5 steps. We vary N by a geometric progression from 101 to 105 in 21 steps.' These provide specific details about the experimental parameters and ranges tested.