Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Transfer of Temporal Logic Formulas in Reinforcement Learning
Authors: Zhe Xu, Ufuk Topcu
IJCAI 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our implementation results show, depending on how similar the source task and the target task are, that the sampling efficiency for the target task can be improved by up to one order of magnitude by performing RL in the extended state space, and further improved by up to another order of magnitude using the transferred extended Q-functions. |
| Researcher Affiliation | Academia | Zhe Xu and Ufuk Topcu University of Texas at Austin EMAIL |
| Pseudocode | Yes | Algorithm 1 Information-Guided MITLf Inference. |
| Open Source Code | No | The paper does not provide any explicit statement about open-sourcing code for the described methodology or a link to a code repository. |
| Open Datasets | No | The paper describes data collection as part of its experimental setup: 'We use the first 10000 episodes of Q-learning as the data collection phase. From the source task, all the 46 trajectories with cumulative rewards above 0 are labeled as 1, and 200 trajectories randomly selected out of the remaining 9954 trajectories are labeled as -1.' This indicates internally generated data, not a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper does not specify exact training, validation, and test dataset splits needed to reproduce the experiment. It describes a 'data collection phase' for labeling trajectories, but not standard data splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | We set α = 0.8 and γ = 0.99. For the inference problem (Problem 1), we set ϱth = 4 and ζ = 0.95. For Algorithm 1, we set λ = 0.01 and hmax = 2. |