reproducibilityindex.ai

Transfer of Temporal Logic Formulas in Reinforcement Learning

Authors: Zhe Xu, Ufuk Topcu

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our implementation results show, depending on how similar the source task and the target task are, that the sampling efﬁciency for the target task can be improved by up to one order of magnitude by performing RL in the extended state space, and further improved by up to another order of magnitude using the transferred extended Q-functions.
Researcher Affiliation	Academia	Zhe Xu and Ufuk Topcu University of Texas at Austin {zhexu, utopcu}@utexas.edu
Pseudocode	Yes	Algorithm 1 Information-Guided MITLf Inference.
Open Source Code	No	The paper does not provide any explicit statement about open-sourcing code for the described methodology or a link to a code repository.
Open Datasets	No	The paper describes data collection as part of its experimental setup: 'We use the ﬁrst 10000 episodes of Q-learning as the data collection phase. From the source task, all the 46 trajectories with cumulative rewards above 0 are labeled as 1, and 200 trajectories randomly selected out of the remaining 9954 trajectories are labeled as -1.' This indicates internally generated data, not a publicly available dataset with concrete access information.
Dataset Splits	No	The paper does not specify exact training, validation, and test dataset splits needed to reproduce the experiment. It describes a 'data collection phase' for labeling trajectories, but not standard data splits.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiment.
Experiment Setup	Yes	We set α = 0.8 and γ = 0.99. For the inference problem (Problem 1), we set ϱth = 4 and ζ = 0.95. For Algorithm 1, we set λ = 0.01 and hmax = 2.