reproducibilityindex.ai

Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN

Authors: Dror Freirich, Tzahi Shimkin, Ron Meir, Aviv Tamar

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we demonstrate our methods empirically. Our experiments are designed to address the following questions: (1) Can the VDGL algorithm learn accurate distributions of multivariate returns? (2) Does the W-1ME algorithm result in effective exploration?
Researcher Affiliation	Academia	1The Viterbi Faculty of Electrical Engineering, Technion Israel Institute of Technology 2Berkeley AI Research Lab, UC Berkeley.
Pseudocode	Yes	Algorithm 1 Value Distribution GAN Learning (VDGL) and Algorithm 2 Distributional Discrepancy Motivated Exploration (W-1ME)
Open Source Code	No	The paper does not provide an explicit statement or a link to open-source code for the methodology described.
Open Datasets	Yes	Cart Pole Swingup and Swimmer Gather (Houthooft et al., 2016)
Dataset Splits	No	The paper does not provide specific details on training, validation, and test dataset splits.
Hardware Specification	Yes	Support from NVIDIA Corporation with the donation of the Titan Xp GPU is also acknowledged.
Software Dependencies	No	The paper mentions using standard RL algorithms like DQN and TRPO, but it does not specify any software dependencies (e.g., libraries, frameworks) with version numbers.
Experiment Setup	Yes	We trained the VDGL algorithm for 1500 episodes of 350 steps, ϵ-greedy exploration (ϵ = 0.05), We run over 100 independent seeds, with 1000 episodes at each experiment., For the ﬁrst two tasks, we set η = 10 7 which gave the best results for both exploration methods. For Swimmergather, we set η = 10 4