Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN

Authors: Dror Freirich, Tzahi Shimkin, Ron Meir, Aviv Tamar

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we demonstrate our methods empirically. Our experiments are designed to address the following questions: (1) Can the VDGL algorithm learn accurate distributions of multivariate returns? (2) Does the W-1ME algorithm result in effective exploration?
Researcher Affiliation Academia 1The Viterbi Faculty of Electrical Engineering, Technion Israel Institute of Technology 2Berkeley AI Research Lab, UC Berkeley.
Pseudocode Yes Algorithm 1 Value Distribution GAN Learning (VDGL) and Algorithm 2 Distributional Discrepancy Motivated Exploration (W-1ME)
Open Source Code No The paper does not provide an explicit statement or a link to open-source code for the methodology described.
Open Datasets Yes Cart Pole Swingup and Swimmer Gather (Houthooft et al., 2016)
Dataset Splits No The paper does not provide specific details on training, validation, and test dataset splits.
Hardware Specification Yes Support from NVIDIA Corporation with the donation of the Titan Xp GPU is also acknowledged.
Software Dependencies No The paper mentions using standard RL algorithms like DQN and TRPO, but it does not specify any software dependencies (e.g., libraries, frameworks) with version numbers.
Experiment Setup Yes We trained the VDGL algorithm for 1500 episodes of 350 steps, ϵ-greedy exploration (ϵ = 0.05), We run over 100 independent seeds, with 1000 episodes at each experiment., For the first two tasks, we set η = 10 7 which gave the best results for both exploration methods. For Swimmergather, we set η = 10 4