reproducibilityindex.ai

Sample Efficient Model-free Reinforcement Learning from LTL Specifications with Optimality Guarantees

Authors: Daqian Shao, Marta Kwiatkowska

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Several experiments on various tabular MDP environments across different LTL tasks demonstrate the improved sample efficiency and optimal policy convergence.
Researcher Affiliation	Academia	Daqian Shao and Marta Kwiatkowska Department of Computer Science, University of Oxford, UK {daqian.shao, marta.kwiatkowska}@cs.ox.ac.uk
Pseudocode	Yes	Algorithm 1: KC Q-learning from LTL Algorithm 2: CF+KC Q-learning from LTL
Open Source Code	Yes	The implementation of our algorithms and experiments can be found on Git Hub: https://github.com/shaodaqian/rl-from-ltl
Open Datasets	Yes	The second MDP environment is the 8 8 frozen lake environment from Open AI Gym [Brockman et al., 2016].
Dataset Splits	No	The paper does not provide specific training/test/validation dataset splits (e.g., percentages or sample counts) as is typical for static datasets. As a reinforcement learning paper, it describes training steps and episodes within environments rather than partitioning a fixed dataset.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions several tools used (PRISM, Rabinizer 4, Open AI Gym) and Q-learning as the core method, but does not provide specific version numbers for the software dependencies used in their experiments.
Experiment Setup	Yes	We set the learning rate α = 0.1 and ϵ = 0.1 for exploration. We also set a relatively loose upper bound on rewards U = 0.1 and discount factor γ = 0.99 for all experiments to ensure optimality. [...] for experiments we opt for a specific reward function that linearly increases the reward for accepting states as the value of K increases, namely rn = U n/K n [0..K]. The Q function is optimistically initialized by setting the Q value for all available state-action pairs to 2U. All experiments are run 100 times, where we plot the average satisfaction probability with half standard deviation in the shaded area.