Transfer Reinforcement Learning with Shared Dynamics

Authors: Romain Laroche, Merwan Barlier

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method is tested on a navigation task, under four Transfer RL experimental settings: with a known reward function, with strong and weak expert knowledge on the reward function, and with a completely unknown reward function. It is also evaluated in a Multi-Task RL experiment and compared with the state-of-the-art algorithms. Results reveal that this method constitutes a major improvement for transfer/multi-task problems that share dynamics.
Researcher Affiliation Collaboration Romain Laroche Orange Labs at Châtillon, France Maluuba at Montréal, Canada romain.laroche@m4x.org Merwan Barlier Orange Labs at Châtillon, France Univ. Lille 1, UMR 9189 CRISt AL, France merwan.barlier@orange.com
Pseudocode Yes Algorithm 1: Transition reuse algorithm
Open Source Code No The paper does not provide any concrete access to source code for the methodology described, nor does it explicitly state that code is released.
Open Datasets No The paper states: 'First, 25000 transitions are generated with a random policy and stored in Ξ.' and 'The 25 tasks are run in turns: one trajectory of 50 transitions of each task is generated with the current policy and is stored in Ξτ, until collecting 20 trajectories from each task, i.e. 1,000 transitions from each task and 25,000 in total.' This indicates a custom dataset generation, not access to a publicly available or open dataset.
Dataset Splits No The paper describes how data is generated for learning (e.g., 1000 or 25,000 transitions) but does not specify explicit train/validation/test dataset splits with percentages, counts, or citations to predefined splits.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions 'linear regression with a Tikhonov regularisation' and 'Fitted-Q Iteration' but does not provide specific software library names with version numbers.
Experiment Setup Yes The state representation S is the agent s real-valued coordinates st = {xt, yt} (0, 5)2, and the set of 25 features Φ(st) is defined with 5*5 Gaussian radial basis functions placed at sij = {i 0.5, j 0.5} for i, j 1, 5 , computed with the SS similarity with σ = 0.2: φij(st) = SS(st, sij). (11). At each time step, the agent selects an action among four possibilities: A = {NORTH, WEST, SOUTH, EAST}. P is defined as follows for the NORTH action: xt+1 xt + N(0, 0.25) and yt+1 yt 1 + N(0, 0.5), where N(μ, ν) is the Gaussian distribution with centre μ and standard deviation ν. The stochastic reward function Rτij is corrupted with a strong noise and is defined for each task τij with i, j 1, 5 as follows: 1 + N(0, 1) if i = xt , j = yt , N(0, 1) otherwise. (12). we use linear regression with Tikhonov regularisation and λ = 1.