Importance Weighted Transfer of Samples in Reinforcement Learning

Authors: Andrea Tirinzoni, Andrea Sessa, Matteo Pirotta, Marcello Restelli

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Furthermore, we empirically compare the proposed algorithm to state-of-the-art approaches, showing that it achieves better learning performance and is very robust to negative transfer, even when some source tasks are significantly different from the target task.
Researcher Affiliation Academia 1Politecnico di Milano, Milan, Italy 2Seque L Team, INRIA Lille, France. Correspondence to: Andrea Tirinzoni <andrea.tirinzoni@polimi.it>.
Pseudocode Yes Algorithm 1 Importance Weighted Fitted Q-Iteration
Open Source Code No The paper does not provide an explicit statement or link indicating the release of open-source code for the described methodology.
Open Datasets Yes Our first experimental domain is a modified version of the puddle world environment presented in (Sutton, 1996). [...] Acrobot (Sutton & Barto, 1998) is a classic control problem where the goal is to swing-up a two-link pendulum...
Dataset Splits No The paper discusses data collection in terms of 'episodes' and 'samples' for reinforcement learning tasks but does not specify explicit training, validation, and test dataset splits in a way that implies reproducibility of data partitioning for supervised learning.
Hardware Specification No The paper does not contain any specific details regarding the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions software components like FQI and Extra-Trees, and uses Gaussian Processes for modeling, but it does not provide specific version numbers for these or other ancillary software components.
Experiment Setup Yes In each algorithm, FQI is run for 50 iterations with Extra-Trees (Ernst et al., 2005). An ϵ-greedy policy (ϵ = 0.3) is used to collect data in the target task. [...] We run all algorithms (except SDT since the problem violates the shared-dynamics assumption) for 200 episodes and average over 20 runs.