Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN
Authors: Dror Freirich, Tzahi Shimkin, Ron Meir, Aviv Tamar
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we demonstrate our methods empirically. Our experiments are designed to address the following questions: (1) Can the VDGL algorithm learn accurate distributions of multivariate returns? (2) Does the W-1ME algorithm result in effective exploration? |
| Researcher Affiliation | Academia | 1The Viterbi Faculty of Electrical Engineering, Technion Israel Institute of Technology 2Berkeley AI Research Lab, UC Berkeley. |
| Pseudocode | Yes | Algorithm 1 Value Distribution GAN Learning (VDGL) and Algorithm 2 Distributional Discrepancy Motivated Exploration (W-1ME) |
| Open Source Code | No | The paper does not provide an explicit statement or a link to open-source code for the methodology described. |
| Open Datasets | Yes | Cart Pole Swingup and Swimmer Gather (Houthooft et al., 2016) |
| Dataset Splits | No | The paper does not provide specific details on training, validation, and test dataset splits. |
| Hardware Specification | Yes | Support from NVIDIA Corporation with the donation of the Titan Xp GPU is also acknowledged. |
| Software Dependencies | No | The paper mentions using standard RL algorithms like DQN and TRPO, but it does not specify any software dependencies (e.g., libraries, frameworks) with version numbers. |
| Experiment Setup | Yes | We trained the VDGL algorithm for 1500 episodes of 350 steps, ϵ-greedy exploration (ϵ = 0.05), We run over 100 independent seeds, with 1000 episodes at each experiment., For the first two tasks, we set η = 10 7 which gave the best results for both exploration methods. For Swimmergather, we set η = 10 4 |