Transfer of Value Functions via Variational Methods

Authors: Andrea Tirinzoni, Rafael Rodriguez Sanchez, Marcello Restelli

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We theoretically analyze them by deriving a finitesample analysis and provide a comprehensive empirical evaluation in four different domains.
Researcher Affiliation Academia Andrea Tirinzoni Politecnico di Milano andrea.tirinzoni@polimi.it Rafael Rodriguez Sanchez Politecnico di Milano rafaelalberto.rodriguez@polimi.it Marcello Restelli Politecnico di Milano marcello.restelli@polimi.it
Pseudocode Yes Algorithm 1 Variational Transfer
Open Source Code No The paper does not provide an explicit statement or link to its open-source code for the described methodology.
Open Datasets No The paper mentions generating custom source tasks for its experiments (e.g., "We generate a set of 50 source tasks for the three-room environment of Figure 1 by sampling both door locations uniformly in the allowed space"). It also mentions using "Cartpole and Mountain Car [34]", which are well-known environments, but does not provide concrete access information (links, DOIs, specific citations) for the exact datasets used to train or test models, especially for the generated data.
Dataset Splits Yes We generate a set of 50 source tasks for the three-room environment of Figure 1 by sampling both door locations uniformly in the allowed space, and solve all of them by directly minimizing the TD error as presented in Section 3.4. Then, we use our algorithms to transfer from 10 source tasks sampled from the previously generated set. The average return over the last 50 learning episodes as a function of the number of iterations is shown in Figure 2a. Each curve is the result of 20 independent runs, each one resampling the target and source tasks, with 95% confidence intervals.
Hardware Specification Yes We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40cm, Titan XP and Tesla V100 used for this research.
Software Dependencies No The paper mentions software components and techniques (e.g., DDQN, neural networks, MAML) but does not provide specific version numbers for any programming languages, libraries, or frameworks used (e.g., Python 3.x, TensorFlow 2.x, PyTorch 1.x).
Experiment Setup Yes We parameterize Q-functions using neural networks with one layer of 32 hidden units for Cartpole and 64 for Mountain Car. A better description of these two environments and their parameters is given in Appendix C.2. In this experiment, we use a Double Deep Q-Network (DDQN) [38] to provide a stronger no-transfer baseline for comparison. ... For this experiment, we design a set of 20 different mazes and solve them using a DDQN with two layers of 32 neurons and Re LU activations. ... We report the detailed parameters, together with additional results, in Appendix C.