Importance Weighted Transfer of Samples in Reinforcement Learning
Authors: Andrea Tirinzoni, Andrea Sessa, Matteo Pirotta, Marcello Restelli
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Furthermore, we empirically compare the proposed algorithm to state-of-the-art approaches, showing that it achieves better learning performance and is very robust to negative transfer, even when some source tasks are significantly different from the target task. |
| Researcher Affiliation | Academia | 1Politecnico di Milano, Milan, Italy 2Seque L Team, INRIA Lille, France. Correspondence to: Andrea Tirinzoni <andrea.tirinzoni@polimi.it>. |
| Pseudocode | Yes | Algorithm 1 Importance Weighted Fitted Q-Iteration |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating the release of open-source code for the described methodology. |
| Open Datasets | Yes | Our first experimental domain is a modified version of the puddle world environment presented in (Sutton, 1996). [...] Acrobot (Sutton & Barto, 1998) is a classic control problem where the goal is to swing-up a two-link pendulum... |
| Dataset Splits | No | The paper discusses data collection in terms of 'episodes' and 'samples' for reinforcement learning tasks but does not specify explicit training, validation, and test dataset splits in a way that implies reproducibility of data partitioning for supervised learning. |
| Hardware Specification | No | The paper does not contain any specific details regarding the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions software components like FQI and Extra-Trees, and uses Gaussian Processes for modeling, but it does not provide specific version numbers for these or other ancillary software components. |
| Experiment Setup | Yes | In each algorithm, FQI is run for 50 iterations with Extra-Trees (Ernst et al., 2005). An ϵ-greedy policy (ϵ = 0.3) is used to collect data in the target task. [...] We run all algorithms (except SDT since the problem violates the shared-dynamics assumption) for 200 episodes and average over 20 runs. |