reproducibilityindex.ai

Importance Weighted Transfer of Samples in Reinforcement Learning

Authors: Andrea Tirinzoni, Andrea Sessa, Matteo Pirotta, Marcello Restelli

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Furthermore, we empirically compare the proposed algorithm to state-of-the-art approaches, showing that it achieves better learning performance and is very robust to negative transfer, even when some source tasks are significantly different from the target task.
Researcher Affiliation	Academia	1Politecnico di Milano, Milan, Italy 2Seque L Team, INRIA Lille, France. Correspondence to: Andrea Tirinzoni <andrea.tirinzoni@polimi.it>.
Pseudocode	Yes	Algorithm 1 Importance Weighted Fitted Q-Iteration
Open Source Code	No	The paper does not provide an explicit statement or link indicating the release of open-source code for the described methodology.
Open Datasets	Yes	Our ﬁrst experimental domain is a modiﬁed version of the puddle world environment presented in (Sutton, 1996). [...] Acrobot (Sutton & Barto, 1998) is a classic control problem where the goal is to swing-up a two-link pendulum...
Dataset Splits	No	The paper discusses data collection in terms of 'episodes' and 'samples' for reinforcement learning tasks but does not specify explicit training, validation, and test dataset splits in a way that implies reproducibility of data partitioning for supervised learning.
Hardware Specification	No	The paper does not contain any specific details regarding the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions software components like FQI and Extra-Trees, and uses Gaussian Processes for modeling, but it does not provide specific version numbers for these or other ancillary software components.
Experiment Setup	Yes	In each algorithm, FQI is run for 50 iterations with Extra-Trees (Ernst et al., 2005). An ϵ-greedy policy (ϵ = 0.3) is used to collect data in the target task. [...] We run all algorithms (except SDT since the problem violates the shared-dynamics assumption) for 200 episodes and average over 20 runs.