Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
On the (In)Tractability of Reinforcement Learning for LTL Objectives
Authors: Cambridge Yang, Michael L. Littman, Michael Carbin
IJCAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments with current reinforcement-learning algorithms for LTL objectives that provide empirical support for our theoretical result.In Section 5, the paper explicitly states: "This section empirically demonstrates our main result, the forward direction of Theorem 1." |
| Researcher Affiliation | Academia | Cambridge Yang1 , Michael L. Littman2 , Michael Carbin1 1MIT CSAIL 2Brown University EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. It describes algorithms conceptually and refers to external works for their implementation details. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code for the methodology or a link to a code repository. |
| Open Datasets | No | The paper uses 'counterexample MDPs' (Figure 2, Figure 3) that are theoretically constructed environments, and also mentions adapting a case study from another paper ([Sadigh et al., 2014]). These are not datasets in the traditional sense, and no concrete access information (link, DOI, etc.) is provided for any 'dataset' that would be used for training. For example, Section 5.1 says 'The first pair is the formula F h and the counterexample MDP as shown in Figure 2.' |
| Dataset Splits | No | The paper does not specify dataset splits (e.g., train/validation/test percentages or sample counts). The empirical evaluation is performed on constructed MDPs and LTL objectives, not on pre-existing datasets with defined splits. For instance, the paper discusses varying 'p' and 'N' for its empirical evaluation, but this does not constitute dataset splitting information. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU specifications, or memory. |
| Software Dependencies | No | The paper mentions using 'various recent reinforcement-learning algorithms for LTL objectives [Sadigh et al., 2014; Hahn et al., 2019; Bozkurt et al., 2020]' in Section 5.1. However, it does not specify the versions of any software libraries, programming languages, or other dependencies used to implement or run these algorithms, which is necessary for reproducibility. |
| Experiment Setup | Yes | Section 5.1 'Methodology' states: 'For each algorithm and each pair of values of p and N, we ο¬x Ο΅ = 0.1 and repeatedly run the algorithm to obtain a Monte Carlo estimation of the LTL-PAC probability... For the ο¬rst LTL-MDP pair, we vary p by a geometric progression from 10 1 to 10 3 in 5 steps. We vary N by a geometric progression from 101 to 105 in 21 steps.' These provide specific details about the experimental parameters and ranges tested. |