Transfer of Temporal Logic Formulas in Reinforcement Learning

Authors: Zhe Xu, Ufuk Topcu

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our implementation results show, depending on how similar the source task and the target task are, that the sampling efficiency for the target task can be improved by up to one order of magnitude by performing RL in the extended state space, and further improved by up to another order of magnitude using the transferred extended Q-functions.
Researcher Affiliation Academia Zhe Xu and Ufuk Topcu University of Texas at Austin {zhexu, utopcu}@utexas.edu
Pseudocode Yes Algorithm 1 Information-Guided MITLf Inference.
Open Source Code No The paper does not provide any explicit statement about open-sourcing code for the described methodology or a link to a code repository.
Open Datasets No The paper describes data collection as part of its experimental setup: 'We use the first 10000 episodes of Q-learning as the data collection phase. From the source task, all the 46 trajectories with cumulative rewards above 0 are labeled as 1, and 200 trajectories randomly selected out of the remaining 9954 trajectories are labeled as -1.' This indicates internally generated data, not a publicly available dataset with concrete access information.
Dataset Splits No The paper does not specify exact training, validation, and test dataset splits needed to reproduce the experiment. It describes a 'data collection phase' for labeling trajectories, but not standard data splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiment.
Experiment Setup Yes We set α = 0.8 and γ = 0.99. For the inference problem (Problem 1), we set ϱth = 4 and ζ = 0.95. For Algorithm 1, we set λ = 0.01 and hmax = 2.