Transfer of Temporal Logic Formulas in Reinforcement Learning
Authors: Zhe Xu, Ufuk Topcu
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our implementation results show, depending on how similar the source task and the target task are, that the sampling efficiency for the target task can be improved by up to one order of magnitude by performing RL in the extended state space, and further improved by up to another order of magnitude using the transferred extended Q-functions. |
| Researcher Affiliation | Academia | Zhe Xu and Ufuk Topcu University of Texas at Austin {zhexu, utopcu}@utexas.edu |
| Pseudocode | Yes | Algorithm 1 Information-Guided MITLf Inference. |
| Open Source Code | No | The paper does not provide any explicit statement about open-sourcing code for the described methodology or a link to a code repository. |
| Open Datasets | No | The paper describes data collection as part of its experimental setup: 'We use the first 10000 episodes of Q-learning as the data collection phase. From the source task, all the 46 trajectories with cumulative rewards above 0 are labeled as 1, and 200 trajectories randomly selected out of the remaining 9954 trajectories are labeled as -1.' This indicates internally generated data, not a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper does not specify exact training, validation, and test dataset splits needed to reproduce the experiment. It describes a 'data collection phase' for labeling trajectories, but not standard data splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | We set α = 0.8 and γ = 0.99. For the inference problem (Problem 1), we set ϱth = 4 and ζ = 0.95. For Algorithm 1, we set λ = 0.01 and hmax = 2. |