Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN
Authors: Dror Freirich, Tzahi Shimkin, Ron Meir, Aviv Tamar
ICML 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we demonstrate our methods empirically. Our experiments are designed to address the following questions: (1) Can the VDGL algorithm learn accurate distributions of multivariate returns? (2) Does the W-1ME algorithm result in effective exploration? |
| Researcher Affiliation | Academia | 1The Viterbi Faculty of Electrical Engineering, Technion Israel Institute of Technology 2Berkeley AI Research Lab, UC Berkeley. |
| Pseudocode | Yes | Algorithm 1 Value Distribution GAN Learning (VDGL) and Algorithm 2 Distributional Discrepancy Motivated Exploration (W-1ME) |
| Open Source Code | No | The paper does not provide an explicit statement or a link to open-source code for the methodology described. |
| Open Datasets | Yes | Cart Pole Swingup and Swimmer Gather (Houthooft et al., 2016) |
| Dataset Splits | No | The paper does not provide specific details on training, validation, and test dataset splits. |
| Hardware Specification | Yes | Support from NVIDIA Corporation with the donation of the Titan Xp GPU is also acknowledged. |
| Software Dependencies | No | The paper mentions using standard RL algorithms like DQN and TRPO, but it does not specify any software dependencies (e.g., libraries, frameworks) with version numbers. |
| Experiment Setup | Yes | We trained the VDGL algorithm for 1500 episodes of 350 steps, ϵ-greedy exploration (ϵ = 0.05), We run over 100 independent seeds, with 1000 episodes at each experiment., For the first two tasks, we set η = 10 7 which gave the best results for both exploration methods. For Swimmergather, we set η = 10 4 |