Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Importance Weighted Transfer of Samples in Reinforcement Learning
Authors: Andrea Tirinzoni, Andrea Sessa, Matteo Pirotta, Marcello Restelli
ICML 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Furthermore, we empirically compare the proposed algorithm to state-of-the-art approaches, showing that it achieves better learning performance and is very robust to negative transfer, even when some source tasks are significantly different from the target task. |
| Researcher Affiliation | Academia | 1Politecnico di Milano, Milan, Italy 2Seque L Team, INRIA Lille, France. Correspondence to: Andrea Tirinzoni <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Importance Weighted Fitted Q-Iteration |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating the release of open-source code for the described methodology. |
| Open Datasets | Yes | Our first experimental domain is a modified version of the puddle world environment presented in (Sutton, 1996). [...] Acrobot (Sutton & Barto, 1998) is a classic control problem where the goal is to swing-up a two-link pendulum... |
| Dataset Splits | No | The paper discusses data collection in terms of 'episodes' and 'samples' for reinforcement learning tasks but does not specify explicit training, validation, and test dataset splits in a way that implies reproducibility of data partitioning for supervised learning. |
| Hardware Specification | No | The paper does not contain any specific details regarding the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions software components like FQI and Extra-Trees, and uses Gaussian Processes for modeling, but it does not provide specific version numbers for these or other ancillary software components. |
| Experiment Setup | Yes | In each algorithm, FQI is run for 50 iterations with Extra-Trees (Ernst et al., 2005). An ϵ-greedy policy (ϵ = 0.3) is used to collect data in the target task. [...] We run all algorithms (except SDT since the problem violates the shared-dynamics assumption) for 200 episodes and average over 20 runs. |