Transfer of Value Functions via Variational Methods
Authors: Andrea Tirinzoni, Rafael Rodriguez Sanchez, Marcello Restelli
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We theoretically analyze them by deriving a finitesample analysis and provide a comprehensive empirical evaluation in four different domains. |
| Researcher Affiliation | Academia | Andrea Tirinzoni Politecnico di Milano andrea.tirinzoni@polimi.it Rafael Rodriguez Sanchez Politecnico di Milano rafaelalberto.rodriguez@polimi.it Marcello Restelli Politecnico di Milano marcello.restelli@polimi.it |
| Pseudocode | Yes | Algorithm 1 Variational Transfer |
| Open Source Code | No | The paper does not provide an explicit statement or link to its open-source code for the described methodology. |
| Open Datasets | No | The paper mentions generating custom source tasks for its experiments (e.g., "We generate a set of 50 source tasks for the three-room environment of Figure 1 by sampling both door locations uniformly in the allowed space"). It also mentions using "Cartpole and Mountain Car [34]", which are well-known environments, but does not provide concrete access information (links, DOIs, specific citations) for the exact datasets used to train or test models, especially for the generated data. |
| Dataset Splits | Yes | We generate a set of 50 source tasks for the three-room environment of Figure 1 by sampling both door locations uniformly in the allowed space, and solve all of them by directly minimizing the TD error as presented in Section 3.4. Then, we use our algorithms to transfer from 10 source tasks sampled from the previously generated set. The average return over the last 50 learning episodes as a function of the number of iterations is shown in Figure 2a. Each curve is the result of 20 independent runs, each one resampling the target and source tasks, with 95% confidence intervals. |
| Hardware Specification | Yes | We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40cm, Titan XP and Tesla V100 used for this research. |
| Software Dependencies | No | The paper mentions software components and techniques (e.g., DDQN, neural networks, MAML) but does not provide specific version numbers for any programming languages, libraries, or frameworks used (e.g., Python 3.x, TensorFlow 2.x, PyTorch 1.x). |
| Experiment Setup | Yes | We parameterize Q-functions using neural networks with one layer of 32 hidden units for Cartpole and 64 for Mountain Car. A better description of these two environments and their parameters is given in Appendix C.2. In this experiment, we use a Double Deep Q-Network (DDQN) [38] to provide a stronger no-transfer baseline for comparison. ... For this experiment, we design a set of 20 different mazes and solve them using a DDQN with two layers of 32 neurons and Re LU activations. ... We report the detailed parameters, together with additional results, in Appendix C. |