reproducibilityindex.ai

Transfer Reinforcement Learning with Shared Dynamics

Authors: Romain Laroche, Merwan Barlier

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method is tested on a navigation task, under four Transfer RL experimental settings: with a known reward function, with strong and weak expert knowledge on the reward function, and with a completely unknown reward function. It is also evaluated in a Multi-Task RL experiment and compared with the state-of-the-art algorithms. Results reveal that this method constitutes a major improvement for transfer/multi-task problems that share dynamics.
Researcher Affiliation	Collaboration	Romain Laroche Orange Labs at Châtillon, France Maluuba at Montréal, Canada romain.laroche@m4x.org Merwan Barlier Orange Labs at Châtillon, France Univ. Lille 1, UMR 9189 CRISt AL, France merwan.barlier@orange.com
Pseudocode	Yes	Algorithm 1: Transition reuse algorithm
Open Source Code	No	The paper does not provide any concrete access to source code for the methodology described, nor does it explicitly state that code is released.
Open Datasets	No	The paper states: 'First, 25000 transitions are generated with a random policy and stored in Ξ.' and 'The 25 tasks are run in turns: one trajectory of 50 transitions of each task is generated with the current policy and is stored in Ξτ, until collecting 20 trajectories from each task, i.e. 1,000 transitions from each task and 25,000 in total.' This indicates a custom dataset generation, not access to a publicly available or open dataset.
Dataset Splits	No	The paper describes how data is generated for learning (e.g., 1000 or 25,000 transitions) but does not specify explicit train/validation/test dataset splits with percentages, counts, or citations to predefined splits.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper mentions 'linear regression with a Tikhonov regularisation' and 'Fitted-Q Iteration' but does not provide specific software library names with version numbers.
Experiment Setup	Yes	The state representation S is the agent s real-valued coordinates st = {xt, yt} (0, 5)2, and the set of 25 features Φ(st) is deﬁned with 5*5 Gaussian radial basis functions placed at sij = {i 0.5, j 0.5} for i, j 1, 5 , computed with the SS similarity with σ = 0.2: φij(st) = SS(st, sij). (11). At each time step, the agent selects an action among four possibilities: A = {NORTH, WEST, SOUTH, EAST}. P is deﬁned as follows for the NORTH action: xt+1 xt + N(0, 0.25) and yt+1 yt 1 + N(0, 0.5), where N(μ, ν) is the Gaussian distribution with centre μ and standard deviation ν. The stochastic reward function Rτij is corrupted with a strong noise and is deﬁned for each task τij with i, j 1, 5 as follows: 1 + N(0, 1) if i = xt , j = yt , N(0, 1) otherwise. (12). we use linear regression with Tikhonov regularisation and λ = 1.